Awarding grades in 2021:  Quality assurance by design

Awarding grades in 2021:  Quality assurance by design
Date15th Mar 2021AuthorNick AllenCategoriesPolicy and News, Teaching

Colleges are faced with a little bit of a challenge. The production of centre assessed grades essentially involves each college becoming its own awarding body, performing the functions of that awarding body in a timeframe which would never be imposed in real life, and performing those functions in a way that stands up to outside scrutiny while doing so without the luxury of the set of protections that the Ofqual and JCQ regulations governing awarding bodies afford. One could mention the other concurrent challenges, such as maintaining high quality teaching and learning in extraordinary and fast-moving circumstances, managing the return to college, and providing mass testing to thousands of students. This paper though focuses on awarding in 2021, and focuses on just one aspect of the challenges we face: developing an effective evidence base to underpin awarding.

In essence, there are two processes that colleges need to get right. Each subject needs to define the evidence base that it intends to use in awarding, and then this needs to be implemented in a rigorous, standardised, moderated fashion, which will stand up to outside scrutiny. In achieving these aims, it is perhaps the definition of the evidence base that is the aspect that needs the most careful consideration. If we get this wrong, then we stand little chance of getting awarding right.

Colleges have been given significant flexibilities in how they develop the evidence base. Use of the assessment materials produced by awarding bodies will be optional, and centres can draw on evidence produced at any point in the course. There is perhaps one phrase which is worth bearing in mind as we design our approaches: we need to assess what students know, what they understand and what they can do at the close of the course.

This freedom to develop our own evidence base is a double-edged sword. It is a bit like telling students to write an essay with the freedom to invent their own title, with an end-point where the teacher will judge the piece of work on whether the title was appropriate or not. It is the range of what might be considered evidence that makes this so challenging. 

Some sources of evidence should be given more weight than others. Assessments completed under exam conditions, with extra time and all other access requirements in place provide a more reliable source of evidence than assignments completed at home, in unsupervised conditions with unlimited time and access (or not) to parents, tutors and the internet. Producing work at home during the pandemic adds an additional dimension here. It is entirely possible that the quality of work produced reflects the circumstances of its creation, rather than what students know, what they understand and what they can do. 

To this we add the fundamental challenge that we may have marks and grades for many pieces of work generated over the two years of a course, but few of these pieces were generated with the intention of contributing towards a final grade. This has huge consequences. The disposition of the student towards the piece of work would not reflect that fact that it was being used in awarding. There is also the question of the extent to which a piece of work reflects a valid test of A level or GCSE ability. It is possible that an assessment uses ‘real’ A level materials, is marked appropriately using genuine mark scheme materials, but does not provide an appropriate test. Taking this further, imagine a subject with a range of question types, with different marks available for each type of question.  A student may complete a number of homework pieces and repeatedly achieve impressive marks for a particular style of question, but this does not necessarily provide an indication that this student would do well in A level as few subjects have papers which simply involve the repetition of exactly the same style of question. Furthermore, few subject teams would standardise and moderate each piece of work that is set, which would impact on the reliability of the marks and grades in the student mark-book. There are many assessments are not even intended to be summative – there may be entirely different motivations for the setting of a particular task, such as allowing students to develop, testing how their background knowledge has developed (in the absence of intense revision), and so forth. 

The chaotic nature of pre-existing evidence has two distinct consequences. We need to complete a very clear sifting and sorting process to identify which mark-book grades are suitable for use in a basket of evidence for awarding. The second challenge relates to students, and the vision they have of their performance. The experience of appeals last year suggests that students (and parents) will draw on almost any grade ever given as evidence of the heights that a student was likely to scale. We need to be absolutely clear which pieces of work are being used in the primary evidence base, and why these have been selected, and make sure students know what has been used, and how they performed.

In developing the plan in each subject, there needs to be a shared understanding of the significance of each of the evidence points. The guidance from the Department for Education[1] recommends the following range of evidence can be used, where available:

  • student work produced in response to assessment materials provided by the exam board, including groups of questions, past papers or similar materials such as practice or sample papers
  • non-exam assessment (NEA) work (often referred to as coursework), even if this has not been fully completed
  • student work produced in centre-devised tasks that reflect the specification, that follow the same format as exam board materials and have been marked in a way that reflects exam board mark schemes - this can include:
    • substantial class or homework (including those that took place during remote learning)
    • internal tests taken by pupils
    • mock exams taken over the course of study
  • records of a student’s capability and performance over the course of study in performance-based subjects such as music, drama and PE
  • records of each student’s progress and performance over the course of study

Thus far our discussions of evidence have focused on what we should do with the evidence we already have. The other part of the discussion that needs to be had is what evidence do we still need to gather. Broadly speaking, for assessment to be credible the basket of evidence needs to cover as much of the specification as is possible. While sixth form college students have had varying experiences of the pandemic as a result of both local and individual circumstances, I have not heard of many colleges who are behind in delivering the specification, or think that what has been offered has not been of high quality. The need to complete assessment in May will bring forward study leave for many. This may trim a few weeks from curriculum delivery, but it is unlikely that there would be huge sections of the specification remaining incomplete. As well as developing an evidence basket that embraces the whole of the course, we should also try to ensure that we cover the assessment objectives and embrace the assessment instruments which are deployed in a normal year. If there is non-examined assessment, then this should be used to the extent to which it has been able to produce this is pandemic circumstances (easier for English Literature than Product Design), but only to the extent that non-examined assessment contributes to the grade in a normal year (20% of the final grade in a normal year, 20% this year). Assessment in languages should cover oral and listening skills as well as written, and in music, performance and composition should be assessed alongside the set texts.

Selecting an appropriate evidence base is one thing: interpreting it correctly is another thing entirely. One of the issues presented this year is that centres have been given huge flexibility on how they implement the assessments to be used in the awarding process. The same task could be delivered in very different circumstances across difference centres. The awarding bodies can provide a marks scheme and grade boundaries, but using these is reduced to absurdity if students in one college have completed this task as a closed book exam, and others have seen the questions in advance, had significant coaching, and completed the assessment with access to their notes and other people. A student who has done the task in exam conditions who scores 40/60 has done significantly better than a student securing 40/60 in open book conditions. To suggest that external quality assurance based on scrutiny of marking can cope with the range of delivery methods that are available is somewhat fanciful.

Ultimately, in this context it is incumbent upon us to implement what we do in our own colleges with honesty, consistency and integrity: it is actually all we can do. We are in an imperfect situation, and what is important here is that we take the time to understand the nuances of the evidence we have.

There is also the not inconsiderable question of flightpath to navigate. We might have a set of grades marked perfectly to A level standards from across the course, but we still need to be careful. Over a two-year course, students (almost always) improve. We must take into account this trajectory when considering our teacher assessed grade. For example, if a subject elected to use a mock exam taken at the end of the lower sixth year as part of its evidence base, it would not be unusual for students to secure a lower grade in that exam than they might a year later in a final exam. Take for example a subject with six evidence points over the two years (in the form of full, high-stakes controlled tests). In sequence, a student has grades of C, C, B, B, A, A, in these assessments. An average grade would put this student at a B grade, but it would be entirely reasonable for the department to consider an A grade. If they feel the evidence at the end of the course is substantial enough to award the A grade then that grade should be awarded. Added to this we have the particularly knotty problem of grade reliability. We know that assessments are not a precise measure of attainment. There are a number of marks that could be legitimately awarded by different examiners for a particular piece of work and at the margins, these might represent different grades. 

We must be wary of any methodology which is simply the averaging of a series of evidence points. Each of the evidence points will tell us something, but it is not necessarily the case that each point tells us the same thing. Some evidence points will be a better indicator of final performance than others. This is where professional judgement comes in. We are looking at the basket of evidence in the context all we know about a student, and all we know about students we have encountered in this and previous years in coming to a view about the standard a student is working at.

Before we conclude, we ought to return to the issue of communicating with students, and how we manage the process in a way which is robust and appropriately transparent. It is expected (read required) that students know which sources of evidence have been used as part of the assessment of their grades, and how well they have performed in these different evidence points. Note that early guidance suggested that we should retain the original pieces of work for all items used in the assessment of final grades, but JCQ have now conceded that this will not be possible in all circumstances. While one might have an instinct to complete this process in secrecy to avoid scrutiny, actually transparency has benefits for us as well. Having a clear, and carefully considered evidence base which is understood by teachers, students and parents provides a firm foundation for internal quality assurance and responding to external quality assurance and appeals. 

This paper has focused on what makes the production of teacher assessed grades challenging process, but this is a challenge that colleges are already well-equipped to meet. Having a clear understanding of what the issues are and the limitations of evidence provides us with the opportunity to chart a path through these difficulties and develop a process that matches the ways that courses have been delivered in our colleges and responds to the nuances of internal assessment.

At the time of writing, the guidance we have to help us through this process is not complete. Most colleges will need to finish the preliminary assessing of grades in around two months if they are to complete internal quality assurance and upload to awarding bodies in time for the 18 June deadline, yet we still await the outcomes of Ofqual’s technical consultation and the detailed guidance from awarding bodies for each specification. We can not wait until all this is available before commencing our plans and processes. While we face a remarkable challenge this year, it is a process that we can navigate. 


Nick Allen is vice principal at Peter Symonds College and leads the Six Dimensions project, an in-depth yearly analysis of academic and progression data for the sector as a whole and for individual SFCA members.

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now