No more marking or no more marks?

Back
No more marking or no more marks?
Date9th Oct 2023AuthorGuest AuthorCategoriesPolicy and News

This piece was initially published on Paul Cline's 'A Psychology Teacher Writes...' blog here

Marking and feedback is a complex beast and one that represents a huge proportion of teacher workload. There have been significant shifts in thinking in recent years towards a more feedback driven model rather than most teachers’ traditional conceptions of what marking looks like. This has been guided by research such as that described in the EEF report on effective feedback. One issue remains a problem, still, in many places; the insistence on giving students a mark or grade on every or most pieces of work. The research here is equivocal; the EEF report states that:

“Written methods of feedback, including written comments, marks, and scores, can improve pupil attainment; however, the effects of written feedback can vary.” 

In my own mission to make the marking and feedback I do more effective (see here and here) one cornerstone for me has been to remove marks or grades from the process almost completely. I’m coming at this from a Psychology perspective, and I’m thinking mostly about extended responses or essays. That said, I do think many of the arguments outlined here will apply to many other subject domains, and be relevant for short answer questions too. I think there are a several reasons to support withholding of marks (or just not giving them at all):

  • Scores are not very reliable
  • Grades are rarely valid
  • Marks don’t (mostly) give students useful information about their progress
  • Marks don’t help students improve

Scores are not reliable

I don’t trust my own marking. I’ve attended enough external INSET over the years to know that I won’t reliably score answers the same way as examiners. Even trained examiners work to a ‘tolerance level’ of disagreement which is fine when scaled up across the whole qualification but more significant when it’s just one piece of work. What if I give it 9/12 and another colleague reads it and suggests 8 or 10, or 7 or 11? There’s no objective way to determine who is correct, and I might even change my own mind on a second reading (we’ve all been in those standardisation meetings, right?). At what level is a student ‘happy’ with this discrepancy? Furthermore, exam boards don’t work to a consistent standard year-on-year. Feedback from mark schemes, examiners reports, training courses shows that they change their criteria over time (or, rather, they change the nuances of the way those criteria are interpreted and applied to students’ answers). What might have gained a 7/8 in 2017 might only score 5/8 in 2019.

It’s virtually impossible to avoid inherent biases while marking. It might be possible to anonymise work to some extent, especially when marking in large volume, but then it’s less easy to give personalised feedback. Over time we get to know students’ handwriting or sometimes even phrasing and we are no more able than anyone else to avoid the effects that our knowledge of particular students has on our judgement of their work. We are also biased towards what we’ve taught them – we look for particular things that we’ve mentioned in lessons – it’s like a nice bit of validation for our teaching; that’s not to say that examiners would credit them in the same way.

Scores or percentages might be useful indicators but grades are not valid

Exam grades are calculated holistically on the basis of a much larger sample of the domain (across several exam papers) so there’s no meaningful way to extrapolate a score on a single piece of work to a ‘working at’ grade. Analysis of exam performance across papers and cohorts shows that students rarely perform consistently on extended answers. So while it might be true that if they typically score, say, 80 % on essay questions then that is likely to lead to an A* grade, virtually no student actually performs this way in real exams. 

But this extrapolation is what students and teachers do all of the time. They either figure out for themselves (based on analysis of some grade boundaries) – or teachers tell them – that 75% is equivalent to, say, a grade A and then apply that rule to all pieces of work, no matter how many marks it might be worth. 

I think most teachers understand this problem but don’t communicate clearly enough, and students (or parents) don’t listen. When a teacher says “This essay is A grade standard” what they really mean is:

If you wrote consistently to this standard, under exam conditions, on every question that comes up on the exam, then, based on some educated guesses about likely grade boundaries (which can’t possibly be known in advance), which I’m basing on the limited number of exam series that have happened since the start of this iteration of the specification (and don’t forget the massive disruption caused by Covid!) it’s very likely that your score would probably be enough to get you an A.

What the student hears, of course, is “You got an A.

What about where we give grades on the basis of larger assessments? We might set an assessment for a particular unit and then give it a grade on the basis that it represents a more significant sample of the domain and therefore such extrapolation is valid. This is not true. When we give students a test, we aren’t wholly (or at all) concerned with what it tells us about their performance on that test, rather what conclusions it might allow us to draw about how well they might perform with material that wasn’t on the test (ie the final exams). The more that the content of the test is predictable (ie they know what will be on the test to a decent degree in advance) then the less valid those inferences become. Therefore the kind of smaller assessments that students typically complete throughout the year (eg end of unit tests) are not valid indicators of future performance because a) the domain sampled is too small and b) the content of the assessment is too predictable.

Scores on individual pieces of work don’t tell students ‘where they are’

Answers are too often completed ‘open-book’. This is not a real indication of what students can actually do for themselves and rarely completed under timed conditions. We might get around this by confining our marks to only those pieces completed in class where we have better control over the conditions under which they are completed. This represents a significant time investment, though, and for subjects with significant amounts of extended writing in their exams it is hard to balance time for such practice with the demands of curriculum coverage. However, whether completed at home or in class, they are often scaffolded to some extent (possibly quite heavily) by teachers first eg writing a plan together beforehand, and on material literally just covered, and therefore again makes the work completed unrepresentative of real exam answer. To be clear, I think we should be scaffolding their answers in advance by sharing success criteria, giving models of excellence or completing planning together. These are all helpful to students, but they reduce the usefulness or meaning of any marks we may then assign to their work. 

As noted above, marks on exam paper questions are designed to reflect the conditions under which the answers were written. Unsurprisingly, many students are capable of producing very high scoring answers in their own time, which bear no relation to what they might produce under exam conditions. Many students write way more than they possibly could in the time given during exams. You can’t award ‘bonus’ marks so how do you explain or identify at what point the answer has done enough to hit top band/marks? This may be possible with points-based questions but often does not apply to an extended response where you might need to see where the student ‘arrives’ at the end of their answer before making a judgement on the quality of their overall argument.

Scores or grades on single pieces of work rarely help students improve

What are students actually meant to do with this information? As noted above, if it doesn’t really tell them ‘where they are’ with any real degree of accuracy then how does this information help them get better? (beyond the motivational prod of realising how below standard their answers might be, although we can achieve this in our feedback without giving scores)

As with grades, the presence of numerical marks decreases the likelihood of students engaging with any feedback comments which means that number is not going to help students get better. As one teacher commented in a staff survey we conducted on marking and feedback at our school:

Most students are not bothered about listening to feedback – once they’ve got their mark – as they are either chuffed and not bothered, or cheesed off and not in the mood to listen!”

Some students might hold a ‘threshold’ mark in their heads – the score which to them is deemed acceptable. This is based on flawed extrapolation from marks to grades (see above) and may lead to complacency or an unwillingness to engage with feedback. If you get 12/16 on an essay and you “know” that this equates to an A grade, and you need an A to get your university place, then why bother trying to make it any better? On the other hand, if the only information you’ve been given is how to get better then that’s all you can respond to.

Extended answers are typically scored using level-based marking which often means a holistic judgement rather than simply awarding ‘points’ across different assessment objectives and then totting them up (even if this is what some examiners might actually do in reality). This makes it really hard to be able to quantify how students might improve. If someone gets 6/12 we can easily tell them very specific things to improve, but there’s no way we could accurately and reliably tell them how to make it a 7, or an 8 etc. The mark becomes a distraction because students are trying to figure out how much they need to improve rather than just how

Similarly, the way questions are written and marked means there is not one ‘right’ way to produce a particular answer. Two answers which ‘feel’ very different might score similarly, and it’s very hard to explain clearly to students how and why this is. This also means that advice about looking at answers which scored higher than your own can, without very careful teacher input, be unhelpful as students are unlikely to be expert enough to discern what they personally need to do to improve.

Stop giving marks and grades. Simple, right?

There are times, of course, when giving a mark or grade is appropriate. When students have completed work under authentic exam conditions and we believe it does truly reflect that students’ capability. When they have been assessed on a significant sample of the domain (eg mocks). When they’re considering their future options and need a rough benchmark of potential attainment to guide their choices. But beyond these specific, and infrequent, situations, there is probably little merit in it. 

However, there are problems with applying this philosophy in a personal capacity if it’s not supported by and reflective of the whole-school culture. If another teacher in your department continues to score or grade then you are definitely the bad guy. If you’re the only teacher refusing to give a mark or grade then students may presume you’re the one doing it wrong. Even worse, this view may well be echoed amongst senior leaders. If your school reporting cycle requires an attainment grade that is closely coupled with performance in an assessment then it can be hard to explain to students the extent to which their grade has been informed by that assessment, rather than it being a direct translation of their score. This underlines the importance of devoting time and training to improving assessment literacy in all staff, especially senior leaders. 

To summarise:

  • Giving marks or grades in most circumstances is simply too unreliable or unhelpful to be worth the time or effort.
  • Where it is necessary to award marks or grades, teachers and schools need to communicate clearly, and consistently, precisely what those data do and don’t actually mean

Paul teaches psychology A level and is director of teaching and learning at a secondary school. You can find him on Twitter/X @PaulCline_ps​y.

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×