Michigan Portfolios

Rating Papers with Rubrics

If you have very little experience working with rating rubrics but feel you would like to begin, here is a preliminary orientation for you.

Paul Diedrich established in the seventies in his book, Measuring Growth in English, that people who might begin very far apart in how they evaluate papers can, through practice, move very close together in their views and become “reliable” in their rating of papers.  He also established that rating papers (or, portfolio work) can be done on a large scale.  There has been much debate on that second point.  We would agree with the writers of an article from 90’s that was titled, “Large-Scale Portfolio Assessment:  Difficult But Not Impossible.”

To give you a short reminder lesson, there are usually two terms referenced most commonly in assessment research discussions:  reliable and valid.  What makes some people so fond of the tests is that so long as the right and wrong answers are properly marked the score is reliable.  What makes the rest of us so distressed by the tests is that they never achieve what is called “construct validity.”  That is, in short, they do no actually measure what they say they measure.

So, by looking at actual student work done for authentic purposes we have taken a giant step toward achieving validity.  Our challenge is the reliability part.  We need to work toward reliable accuracy of scores—different writings are viewed through the same analytical lens, so we can rely that our concept of a “2” rating, for example, will remain the same from paper to paper.  And, we need to work toward what is called “inter-rater” reliability—different raters will give the same paper the same rating.  Both of these, of course, can be achieved and have been achieved in various places, whether through training by state departments of education or through particular research projects.

You may wish to begin by utilizing someone else’s rubric, one that makes sense and seems useful to you.  That is easiest, and it is quite legitimate.  Normally, the history of a school with a rubric is that a rubric that at first seemed quite appealing soon seems inadequate.  That is often when people begin to design their own rating systems.

If you wish to design your own rubric, a rudimentary guide for doing so is provided in this “Getting Started” section of the website.

Often experts make heavy use of “anchor papers”—papers that have been selected to demonstrate a classic “2” or a “3” or whatever score.  It is even possible to select papers that represent a “gray area” between two ratings.

In the beginning you may not have anchor papers.  We didn’t.  There is much to be gained by members of a grade level teaching team meeting together to practice rating papers and discussing the teaching implications of the ratings.  It is important to remember that in such discussions the point is not to “win” by defending your score to the end but rather for the group to move closer together in how they see papers.

If you wish to achieve ratings with reliable outcomes (a good goal), you will need to train raters—probably educators in your system—to rate reliably, and then provide them time and papers without dates or names on them so they can rate objectively.