Poor quality learning assessments are crumbling under the weight of the decisions they inform

571_Rachel O_croppedBy Rachel Outhred, Education Consultant, Oxford Policy Management

Much of the recent international discussion regarding the measurement of learning outcomes globally has been driven by the need to monitor Sustainable Development Goal 4‘to ensure inclusive and equitable quality education for all’. Such learning assessments, as will be shown in the next GEM Report due out in October of this year, are one of many types of mechanisms being used to hold different actors to account if progress towards SDG4 is dragging its feet. However, against the backdrop of increased threats to aid funding in countries such as the UK, and the prevalent use of ‘pay by results’ in development programming (such as the £344 million Girls’ Education Challenge Fund), the stakes involved in measuring learning outcomes are being raised.

The need to measure learning outcomes well in development programming is rarely seen as overly contentious until we start to drill down into the practicalities. In practice, value for money concerns, the need for rapid data to inform policy and a simple lack of technical know-how often result in unreliable or invalid learning measures.

oxford blog 1For example, in a recent systematic review of the assessment of language and literacy skills for children in developing countries between 1990 and 2014, Sonali Nag and her team of researchers found that the reporting of reliability and validity for assessments in developing countries is very rare. Over 70 percent of the studies rated as having ‘Moderate’ and ‘High’ methodological rigour did not even report on test reliability. Moreover, studies that did report reliability included levels as low as 0.23 on Cronbach’s alpha – a common measure of internal test reliability that ranges between 0 and 1, with anything below 0.5 considered as unacceptable. Finally, few of the studies reviewed used methodologies that can ensure tests are capturing a child’s true performance, rather than capturing the quality of the assessment items themselves.

It is clear that large-scale assessments cost money, and equally clear that the limited international development resources of governments and donors demand a focus on value for money.

However, value for money doesn’t mean simply choosing the cheapest option, and it should look at longer-term sustainable benefits, not just short-term tangible resources. At the same time, assessments and evaluations must ensure that policy makers can make practicable decisions to inform the next programme before the assessment data becomes outdated or irrelevant.

What are the risks of choosing poor quality instruments, and what are the factors that should be considered when deciding which learning assessments to pursue?

  1. Developing robust learning measures is a complex task

While it is relatively simple to measure something in the physical world, such as height or weight, measuring learning requires a more complex and nuanced approach. (It has even been suggested that learning cannot be quantified at all.) Not only is learning less observable and quantifiable, it is much less consistent: if you were to measure your height twice, you would be much more likely to get the same result than if you took two tests of the same difficulty on different days. Standardising test conditions, test administration and test marking and knowing what to test and why requires expertise.

  1. Low quality measures produce a lack of information

An assessment that is too easy or too difficult for the vast majority of students leads to a lack of information, defeating the purpose of the assessment. If most of the students score very low, the information gathered only shows what students don’t know, not what they do know. If all the students score perfect or near perfect scores, we cannot ascertain the upper threshold of what a group of students know. This is the equivalent of using a set of scales that only measure up to ten stone to monitor adult weight in the UK: such an approach would tell you very little about the vast majority of the population.

  1. What we test becomes what is taught

Teachers and students tend to see the contents of assessments as reflecting what students should learn and what teachers should teach. Nag’s recent review highlights how assessment informs teaching practice. Poor quality assessments can test students on things they don’t need to know and this can lead to confusion. In doing so, they re-orientate what is taught in the classroom away from what is most beneficial for students. Sonali Nag’s research finds that ‘teachers who assess well and use test information well, teach better’.

  1. Value for money

While developing quality assessments requires an investment in technical expertise, the most expensive aspect of measuring learning is actually administration – and this cost doesn’t usually vary depending on the quality of the assessment. There is a strong value for money argument, then, to invest in developing quality assessments so that these costs provide the most amount of accurate information possible

oxford 2Education assessments can have real, wide-reaching and significant effects. Key findings can influence, for example, education reform, language of instruction policy, disbursements for service providers, the continuity of programmes, school funding, parental school choice and teacher pay. The quality of the evidence must reflect the weight of the decisions they seek to impact. In short, while value for money and rapid information are important, we need to think through the real implications for value and utility — as the next GEM Report on Accountability will surely do – because learning assessments are increasingly used as a basis for key decisions in education delivery.



  1. The measurement issues discussed are sensible, but whom does measurement help? The most direct beneficiaries seem to be donor staff doing their jobs. In which poor countries was measurement used to improve instruction with practically significant result? The author perhaps can give us examples.

    It will be interesting to see when instruction will get the attention and detail that measurements get.
    When will when blogs and bloggers start dealing with information processing in students’ heads? Hopefully some time before 2030.

  2. Good summary Rachel of quality of the measurement instrument . We really need to focus more on these qualities to ensure that observed abilities of learners match with true abilities.

  3. It is necessary to get educators, teachers, community, administrators (even input from students) at k-12 level to decide on what is to be taught and learned. From there it should be somewhat easy to find unobtrusive ways to monitor and evaluate to determine mastery. Perhaps findings from IEA international surveys on curriculum, teaching, evaluation, administration, financing, etc can be helpful.

Leave a Reply