EXAMINING EXAMS – April 2023

Over 300 years since the first written exam was used in the English education system, this traditional form of assessment continues to divide opinion. To their supporters, written exams provide a rigorous test of students’ knowledge and understanding that acts as a source of motivation as well as a sound basis for progression onto university or employment. Indeed, Prime Ministers, Education Secretaries, Schools Ministers and regulators have publicly stated that written exams are the ‘best and fairest’ way to measure pupils’ attainment. Meanwhile, critics argue that written exams are narrow assessments that focus too much on memorisation and fail to provide students with the wide range of skills that they need for later life and work.

With a General Election looming, coupled with the collapse of the exam system in 2020 and 2021 due to the outbreak of COVID-19, debates over the future of exams have become increasingly vocal. As a result, this report set out to understand if the current dominance of written exams in our assessment landscape is justified and whether the following alternatives to exams could and should play a greater role in our high-stakes assessment system towards the end of secondary education – most notably at age 18:

Coursework and controlled assessments
Oral exams
Portfolios
Extended essays and projects
Performance-based assessments

Developing and demonstrating a wide range of skills

A common criticism of written exams is that they focus too heavily on recalling knowledge, whereas other methods of assessment can emphasise other competencies. For example, the Extended Project Qualification (EPQ) – a voluntarily and independently-produced essay or project completed alongside A-levels – encourages students to investigate a topic of their choice, with the aim of developing their research, extended writing and presentation skills. Meanwhile oral assessments (such as the speaking components of language exams) give students the opportunity to demonstrate their knowledge in a more practical way while also seeking to improve their verbal communication skills.

Developing wider skills through different methods of assessment is not just a theoretical goal. This report identified several studies showing that a student’s grade on the same course material may be different in an oral assessment or a portfolio (a collection of work, often used to assess subjects such as design and technology) compared to a written exam, indicating that these alternative assessments may be capturing different elements of performance. Using ‘multi-model’ assessment to get a broader view of a student’s capabilities is common in technical education and apprenticeships but rarely features in academic settings.

Assessments that reflect ‘real-world’ settings

Written exams are normally completed in an artificial environment (such as a silent hall) that does not reflect real-world settings. In contrast, some alternative assessments allow students to acquire and demonstrate skills needed for employment and further study. For example, there is evidence showing that students who complete the EPQ may be better prepared for university degrees, while oral assessments promote the verbal communication skills that employers frequently claim are lacking among many school and college leavers.

The nature of some subjects means that assessments which closely resemble real-world settings are undoubtedly preferable to written exams. For example, the most appropriate way to assess a student’s musical skills is through a live performance, while other artistic abilities such as drawing and painting are best captured through a portfolio of work. Although there is evidence to show that assessing creative subjects inevitably involves a greater degree of subjectivity (and thus less consistent grading) than a written exam, they remain the most credible way of capturing a student’s attainment in these subjects.

Guarding against malpractice

When they were first examined in 1988, GCSEs often had a large coursework component. Just three years later, then Prime Minister John Major voiced concerns that standards were “at risk” with some students allegedly getting too much assistance from teachers or parents or even having their coursework written for them. To contain the risk of malpractice, there was a shift in the mid-2000s from coursework to ‘controlled assessments’ i.e. coursework completed under supervised conditions, with much tighter controls on its design, delivery and marking. Even so, a review in 2013 by the exam regulator Ofqual found that there was still “too many opportunities for plagiarism” and that, in some subjects, there was very little to distinguish between a controlled assessment and a written exam due to the tight controls. Consequently, Ofqual severely curtailed the use of ‘non-exam assessment’ (NEA) including coursework-style tasks. Many GCSEs and A-levels (e.g. history, geography, drama) have seen significant reductions in the contribution of NEA towards a student’s final grade, and in some subjects (e.g. science) NEA has been eliminated altogether.

Despite these changes to GCSEs and A-levels, concerns over malpractice in a high-stakes assessment system persist in other forms of assessment such as the EPQ and the International Baccalaureate’s (IB) compulsory ‘Extended Essay’, which are both completed without supervision. The development of new technology such as ChatGPT and other chatbots has exacerbated existing concerns regarding plagiarism as these tools can produce entire essays and projects with minimal input (if any) from the student. Such is the inability of exam boards to identify malpractice related to chatbots, the IB recently announced that students were actually allowed to use such software to complete their Extended Essays. In contrast, the scope for malpractice in written and oral exams is greatly reduced by the controlled testing environment, thus making the grades awarded for these assessments more trustworthy.

The practicality of assessments in a high-stakes system

The inconsistencies in how coursework and controlled assessments were delivered in schools and colleges created numerous problems when seeking to award grades on a fair basis across the country, particularly when some students ended up receiving more advice and assistance than their peers (even within the same institution). Although controlled assessments sought to enforce more specific rules on the level of permitted help for students, teachers reported that there was still too much room for interpreting the rules differently and Ofqual found that in some cases “too much teacher input” continued. Extended essays and projects such as the EPQ continue to face the same challenges. For example, students must write their own research questions for their EPQ but teachers are allowed to provide feedback on them – meaning some students could be receiving more support than others, potentially giving them an unfair advantage. Written exams largely avoid these problems by ensuring that all students taking the exam receive the same questions in a standardised and strictly controlled environment, so the results should reflect a student’s genuine attainment rather than being influenced by the amount of support that they received.

Written exams are also relatively cheap to deliver and mark, which is hugely beneficial when assessing tens (if not hundreds) of thousands of students over a short period. In comparison, coursework and controlled assessments were both very time consuming – often taking several months to complete – and used up a large portion of the curriculum time for each subject. Moreover, they increased the workload of teachers who had to supervise and mark the tasks. Other methods of assessments also need a considerable amount of time to ensure that they produce a credible measure of student attainment. Research has shown that the consistency of marks awarded for portfolio assessments improves when multiple assessors mark each collection of work produced by a student (as is commonly done in smaller subject such as art and design), but this intensive approach would quickly become unfeasible at a larger scale.

The challenges caused by asking teachers to award grades

Students often appear to perform better in assessments such as coursework and controlled assessments that are graded by their teacher rather than external examiners, yet research by Ofqual found that this was not necessarily a “fair representation” of a student’s attainment. Asking teachers to award grades for these assessments also made it hard to differentiate between students because of a ‘bunching’ towards the top end of the available marks. This was perhaps unsurprising as teachers reported being in a “difficult, sometimes stressful” position when marking their students’ work as they knew that “their own performance and that of the school” would be judged by the results. The EPQ, which is marked by teachers, has seen a similar bunching of marks, with 45 per cent of candidates being awarded an A or A* in 2019 (rising to 55 per cent during the pandemic) compared to 25 per cent across all A-levels.

Written exams typically produce grades that expose different levels of attainment within a cohort of students because they are externally mark and designed to test students’ knowledge of externally set content. These normal safeguards disappeared during the pandemic, leaving teachers with the unenviable task of determining their own students’ grades. The proportion of A and A* grades awarded across all A-level subjects subsequently leapt from 25 per cent in 2019 to 44 per cent in 2021, with sharp rises in top grades also visible in GCSEs. Teachers were put under immense pressure during this period and reported frequently working late into the night to manage their substantial workload due to this enforced experiment with teacher-assessed grades. A survey by Ofqual in 2021 found that less than 40 per cent of the public had confidence in A-level and GCSE grades awarded during the pandemic, further emphasising the risks created by a grading system that does not produce trustworthy outcomes. When coupled with the findings of numerous inquiries and studies conducted well before the pandemic, the research evidence clearly demonstrates why asking teachers to award grades to their students should be avoided within a high-stakes assessment system.

Widening of disparities between groups

Numerous research studies have found that when teachers are asked to award grades to their students, they can be influenced by their existing knowledge of that student. For instance, a teacher may inadvertently award a piece of work a higher grade than it deserves because the student in question is generally a high achiever. Teachers can also be influenced by a pre-conceived (and often subconscious) idea of how well a student may perform based on demographic factors, with multiple studies showing that grades awarded by teachers can be lower for children from less well-off families and those with special educational needs compared to other children of the same ability level. When teachers were responsible for awarding grades during the pandemic, concerns over widening disparities between students were again evident as some existing performance gaps between students from different demographic groups increased – particularly for black students and those from lower socio-economic backgrounds.

This so-called ‘bias’ is not intentional, nor is it unique to teachers, as even external assessors can be biased in their judgements. For example, one study found that during musical performances an assessor’s mark can be influenced by the gender and ethnicity of the performer as well as the experiences of the assessor themselves (e.g. how familiar they are with the piece being performed). In contrast, written exams reduce the opportunity for bias to occur as they are marked anonymously as well as externally. Other methods of assessments can also limit the opportunities for bias by using anonymous marking (as is done for marking A-level and GCSE music performances as well as the IB Extended Essay).

Different approaches to marking

Written exams are generally judged in a consistent way as assessors follow the same mark scheme that sets out the knowledge and skills required from candidates. This approach makes it more likely that if a student were to take the exam again, they would achieve an identical or very similar grade. Some alternative methods of assessment struggle to achieve the same consistency, as demonstrated by the pandemic-era experiment with teacher-assessed grades. Research has shown that consistent grading can also be difficult to achieve when the assessor is asked to make a more subjective judgement (e.g. art portfolios or drama performances) because even with a mark scheme, assessors may value different skills, traits and styles in creative outputs.

Several studies have described ways to improve consistency between assessors. One of the most important findings is that there is greater consistency between teachers when they are asked to rank students in order of how well they performed rather than awarding specific grades. Another way to improve consistency is by asking the assessor to make their judgement in a different way. Studies have found that asking assessors to award a single holistic score to a piece of work in the absence of any formal marking criteria can often produce more consistent grades than asking assessors to judge the same work using a detailed and prescriptive mark scheme.

Conclusion

There are good reasons why written exams have come to dominate the assessment system in schools and colleges. They are a relatively low-cost, standardised and impartial way to assess students’ knowledge and understanding, with a much lower probability of being affected by malpractice or inconsistent grading than other methods of assessment. The controlled setting in which exams normally take place also means that students, parents, universities, employers and the government can have confidence that the awarded grades are a genuine reflection of a student’s attainment. Any reforms that may result in greater inaccuracies or inconsistencies in grading would be detrimental to students as well as taxpayers who have every right to expect a publicly funded system to deliver fair judgements on students. Nevertheless, every method of assessment involves trade-offs and written exams are no exception, particularly their limited value in building many skills that are useful beyond the confines of an exam hall.

Regardless of the imperfections of written exams, the problems faced by many alternative forms of assessment are hard to ignore. The advent of ChatGPT is a significant threat to the integrity of formal assessments in this country and elsewhere. Plagiarism has always been a risk to some extent, especially for coursework-style tasks, but establishing for certain whether a student produced the work that they submitted has now become a virtually impossible task for teachers, leaders and exam boards. Consequently, it would be unwise to increase the proportion of coursework or similar assessments into our high-stakes system because there is no realistic prospect of preventing widespread malpractice. What’s more, the burden that would be placed on teachers by switching from external exams to more internal assessments should not be underestimated given what teachers have consistently reported in the past.

This report concludes that both supporters and critics of written exams make valid arguments regarding the benefits and drawbacks of this enduring form of assessment. As a result, the following recommendations seek to build on the most commendable attributes of written exams while also drawing on the benefits of other types of assessment that can withstand the demands of our high-stakes assessment system. If, as this report proposes, a government – either current or future – is willing to invest more in schools and colleges to ensure that every institution can offer a wider range of courses and assessments, our assessment system will be placed on a stronger foundation for many years to come.

Recommendations

RECOMMENDATION 1: To maintain the credibility of the high-stakes assessment system in the final years of secondary education, written examinations should continue to be the main method of assessing students’ knowledge and understanding. In contrast, placing a greater emphasis on coursework and other forms of ‘teacher assessment’ would increase teachers’ workload and lead to less reliable grades that may be biased against students from disadvantaged backgrounds.
RECOMMENDATION 2: To broaden the curriculum and develop a wider range of skills than those promoted by written exams, students aged 16-19 taking classroom-based courses should be required to take one additional subject in Year 12 (equivalent to an AS level) that will be examined entirely through an oral assessment.
RECOMMENDATION 3: To ensure that students taking classroom-based subjects can develop their research and extended writing skills beyond an exam setting, the Extended Project Qualification (EPQ) should be made compulsory. In future, the EPQ will be used as a low-stakes skills development programme and will therefore be ungraded.
RECOMMENDATION 4: To give schools and colleges the resources they need to expand their 16-19 curriculum to include an additional subject and the EPQ, the ‘base rate’ of per-student funding (currently £4,642) should be increased by approximately £200 a year to reach £6,000 by 2030.

Sky News – ChatGPT will make marking coursework ‘virtually impossible’ and shows exams ‘more important than ever’

The Times – Oral exams could weed out artificial intelligence cheats

Independent – Coursework ‘will deliver less trustworthy grades than exams’ in age of ChatGPT

TES – Teacher assessment ‘impossible’ amid ChatGPT rise

Schools Week – Think tank proposes plan to cut AI chatbot cheating

APRIL 2023