Monday 27 January 2014

Final Exams and the Art of Percentage Marks

It's exam time again. High school students around the province are anxiously preparing for their final exams. Many students would be very happy with the idea of abolishing examinations. I would too, they are the ultimate teacher make work program. Not only do I design examinations for my students, I take care to create a fair environment where my students can write them, then I spend hours marking all of them, and recording the grades in order to come up with final marks and informed comments on learning.

It's work all around, I tell you. I'm not complaining, it's a real art form to come up with a single percentage mark that summarizes student learning. That's what I want to talk about today, the art of a percentage mark. That single mark. You know, on these.

Percentages marks are always based on evidence of student learning. Each teacher gathers evidence of student learning in four KICA categories over time,  and then creates unit tests or projects that summarize the key learning across these four categories. (The KICA categories are part of the Ontario Achievement Chart and stand for knowledge, inquiry/thinking, communication and application. These are province-wide performance standards.)

Without evidence of learning, a teacher cannot determine if learning has taken place, and therefore cannot assign a mark. In these cases, the student will be left with an incomplete course on their high school transcript at best. That's why teachers are such sticklers for due dates and assignments being handed in. And this line of thought leads us to the concept of "most consistent, most recent" bit that teachers talk about.

Most Consistent, Most Recent
The Ontario Growing Success policy document makes it clear that teachers should use the “most consistent, most recent” evidence of student learning when calculating percentage marks (Growing Success, page 39).

A lot of teachers I know use Markbook (Asylum Software) to record and calculate grades. This year, in accordance with the Growing Success policy document, the Markbook developers changed the default mark algorithm to Blended Mode (away from Weighted Average).

This led to some very interesting situations for myself and colleagues when the progress reports came out last October. I myself handed out two separate printouts to my students because I had not yet mastered the new output format of Markbook. Some student marks differed by more than 5% on the two reports. This led to several great conversations with students about what a percentage mark means, and I want to summarize the message of these conversations below.

Mean, Median and Mode
MarkBook has five distinct ways of calculating a student’s percentage grade which can produce five different marks for each student:
  • Weighted Average - the arithmetic mean 
  • Weighted Median - arithmetic median
  • Weighted Mode - arithmetic mode
  • Blended Mode - a combination of modes, levels and percentages, explained below 
  • Blended Median- a combination of medians, levels and percentages, explained below 
There is a brief description of mean, median and mode in the image above. The mean is the sum of all the numbers divided by the quantity of numbers in the set. The median is the middle number, or the average of the two middle numbers. The mode is the most commonly occurring number. And the range, is the highest number minus the lowest number. The range is simple for mark sets, it's 100-0 = 100.  (I'll define blending and weighting in a few seconds)

The Art of Percentages
The intention of the Ministry is that each teacher considers the mean, median and mode simultaneously, and considers a student's entire mark set in order to make  an informed judgement about each student’s acquisition of the curriculum. There is a diagram to describe this below.
  1. When there strong agreement among all three calculations, e.g. a range less than 5 % between the mean, median and mode, the teacher can be quite confident that marks are not being skewed by any one calculation and the average of the three measures strongly indicates the student's progress. This is the symmetric distribution seen in the image below, with strong agreement between the mean, median and mode.
  2. When there is not strong agreement among all three calculations, as in the skewed distribution above, the teacher is to look at the most consistent and recent levels of achievement in conjunction with the mean, median and mode.(Growing Success, page 39). For instance, if the student aces  the final exam, the mark on the exam will strongly weigh into the teacher's professional judgement because it is the most recent evidence of learning.
Weighting and Blending
Two additional terms that you need a definition are weighting and blending.

Weighting refers to assigning some student scores a greater weight than other scores, because they come later in a unit (a final project) or are more significant (such as a unit test). For example if a teacher weights a vocabulary quiz at a 1 and unit test at a 10, it means that the unit test counts for 10 quizzes, and that the unit test is 10 times as significant as the quiz.

Blending  refers a three-step process starting with a weighted median or mode in each KICA category (knowledge, inquiry, communication and application) then:
  1. Converting this number to an achievement level (0, 1, 2, 3  and 4).
  2. Converting this level back to a percent using the mid-range value of that level (e.g. if the mode is level 3 with a base of 70 and the  base of level 4 is 80, then the percent assigned to a level 3 result is 75%), and finally,
  3. Finding the weighted average of the mode or median  of each KICA category.
Why so complicated... well,  the fact of the matter is blended mode is the measure that most closely reflects policy, “ most consistent, more recent” (p 39, Growing Success). All deliberations around determining a final term mark/grade are intended to begin from the blended mode as a point of reference.

My Personal Art of Percentage Marks
I started this semester influenced by my personal comfort with using the arithmetic mean as my preferred  representation of central tendency.

Most teachers I know will immediately reference this comfort, most teachers I interact with take the mean as a valid interpretation of a mark set. However, I began to see problems, especially because the mean can be influenced by outliers. I saw that when I left only the summative "assessment of" marks in my calculations, the arithmetic mean tended to swing by 1-5 %. My observation was that it is really not fair to penalize a student for failing to learn quickly enough, when there is evidence they did indeed learn the material by the end of the course.

As the semester progressed, I saw and experienced first hand that weighted modes and weighted medians are a much better representation of central tendency in a student mark set. These values are are not greatly influenced by outliers. You can see this in the image below. The mode and median describe this mark set more accurately than the arithmetic mean.

It is my intention that a student can have a 'bad hair day' mark-wise early in the unit, without it shattering their final mark. Similarly, acing a few quizzes early in the unit before slacking off should also not unduly influence a final mark. Consistency is the key.

What really matters for the percentage mark is that student demonstrates achievement of curriculum expectations by the end of the course. (Students are also required demonstrate of learning skills and work habits, and these are an integral part of a student’s learning but these are separately evaluated where possible. That is a whole other discussion.)

This past midterm,  I started with the arithmetic mean, and then used both the weighted median and weighted mode, to inform my efforts to leave the mark the same, or adjust it upwards or downwards in order to consistently represent the student mark set.

This process of mark evaluation puts the ‘professional judgement’ of the teacher back to the forefront of  learning. Growing Success asks teachers to use professional judgement in order to:
  • determine which specific expectations should be used to evaluate achievement of the overall expectations 
  • determine which specific expectations will be accounted for in instruction and assessment but not necessarily evaluated. 
  • weigh all evidence of student achievement in light of these considerations. 

"Percentage marks represents a student’s achievement of overall curriculum expectations, and should reflect the student’s most consistent level of achievement, with special consideration given to more recent evidence."(Growing Success, page 39)

As Growing Success states "Teachers’ professional judgements are at the heart of effective assessment, evaluation, and reporting of student achievement." (Growing Success, page 8)


I’m very much for it.

References:
Provincial Report Card 
Growing Success
Markbook 
Ontario Achievement Charts