Carrying out national and cross-national learning assessments involves an enormous commitment of time, effort and resources. As such, they are most cost effective when they serve multiple purposes, which might also include providing an input to global monitoring of learning outcomes.
Moreover, if these assessments are meant to track progress over time, rather than serve as a one-off measure of student learning, they require sustainable financing and political commitment, independent of a particular government or minister in power. In many parts of the world, these conditions are difficult to obtain and solutions on the ground are inevitably imperfect.
Thus the development of global measures of learning is not simply a technical issue, but also a political one. In some countries opposition to learning assessments has become apparent among different groups, including parents, teacher associations and political parties.
To address these concerns the GEM Report recommends that governments embrace open and inclusive approaches that prioritize the needs and capacities of their countries based on the criteria of inclusivity, efficiency and feasibility. And they should make concerted efforts to build consensus around the content, quality and process of assessment activities.
Agreeing what areas to assess is a critical first step
Learning assessments need to set basic parameters to determine a minimum proficiency level in a learning domain such as reading, mathematics or civics. First, what are the boundaries of the domain being assessed? Second, what is an expected progression of learning within these boundaries in primary and secondary education? Third, what questions and responses demonstrate that a learner has reached a particular level of proficiency? Finally, how are different proficiency levels defined and what criteria are used to distinguish between different levels?
Countries need to take multiple points of view into account in order to come to an agreement on each of these issues. Given recent experience, however, achieving such agreement is no easy task. Take, for example, the 2013 Third Regional Comparative and Explanatory Study (TERCE). TERCE was administered in fifteen countries, covering two grades (3rd and 6th) and four learning domains (reading, mathematics, natural sciences and writing). The experience of the TERCE Study is that, even in a region like Latin America, where countries share many features in their education systems, an indicator that worked across them of minimum proficiency in each domain required negotiation.
The questions in the TERCE assessment were based on an analysis of common curricular elements in the participating countries to ensure their appropriateness for measuring learning outcomes in the region. Four different levels of proficiency for each domain and grade level were determined with the help of experts who ranked the items according to levels of difficulty. When the 2013 assessment introduced a different scale to report scores as compared to a previous survey in 2006, a High-Level Consultative Technical Council was established to advise national coordinators. Countries were closely involved in the process of reaching consensus, as they were aware of the communication challenges involved in making such a change.
A global metric of learning in any domain would require a similar process. Representatives from countries and regions would need to identify and agree upon what areas of existing national or regional assessments can be used, how learning progress in each domain is understood in different contexts, and which questions in existing assessments best capture learning progression.
Clear standards should be established to assure the quality of learning assessments
Once a decision to conduct a learning assessment has been made, a supportive enabling environment and clear standards are also needed to help ensure reliability, validity, and transparency. While quality assurance may be a demanding task, it is a critical one. How can the international community be assured that national, regional or international assessments are fit for the purpose of global monitoring?
This raises two important issues.
First, the technical requirements often become quite stringent. The more stringent they become, the more likely that fewer organisations or administrative units will have the capacity to provide support for such assessments. This could result in a limited pool of private service providers who would dominate technical assistance in this area.
Monitoring learning outcomes globally should be seen as an international public good contributing to the achievement of international priorities in education and development. It should not be seen as an opportunity to increase a company’s market share in assessment assistance.
Secondly, existing resources to support capacity building for learning assessments are not allocated efficiently. For example, regional programmes such as SACMEQ, PASEC and TERCE have not received consistent financial support. Countries wanting to participate in assessment programmes are often not adequately or consistently supported. More critically, the conduct of existing donors aligns poorly with the principles of aid effectiveness, as individual agencies pursue short-term objectives that do not contribute to the building up of sustainable national assessment systems. The lack of predictable funding for learning assessments limits the participation of poorer countries in assessments and weakens the potential impact of global measures of learning.
Achieving agreement for a global indicator of minimum proficiency is not an easy task
The easiest way to develop a global measure of learning would be for all countries to participate in a new international assessment. But this poses significant political challenges, and would likely not provide data fit for national policy purposes. The most effective way to construct a global learning metric is by building on existing assessments. This would involve using test items from existing assessments implemented in a range of educational settings across the world.
In the context of global monitoring, we need to consider two key issues, which are relevant to the way any metric is designed. First, given the emphasis of the new global education agenda on leaving no one behind, the collection of accurate background information on those assessed is essential. Current assessments currently collect inconsistent background data, which hampers the reporting of the global indicator by population characteristics. This is especially the case among children of primary school age, who are not well placed to provide accurate information about their family circumstances.
The second issue involves the importance of knowing about the cultural and linguistic context in which assessments are carried out so as to improve how we interpret differences within and between countries.
Finding consensus matters
The challenges of deciding upon metrics for measuring learning internationally clearly make this topic a political hot potato. Yet, it’s one we should encourage debates about, such as those taking place at the CIES Fall Symposium this week, because tracking learning over time can serve important purposes. For national policy makers the monitoring of learning gains or losses provides an indication of system performance. It can also serve as a marker of on-going reform efforts to improve teacher preparation, broaden the relevance of instructional materials and assess the effectiveness of interventions targeting underachieving students.
Keeping tabs on student learning at the school level can help principals compare the achievements of their students with those of similar backgrounds in the same district, region or province. Such yardsticks better situate the specific learning challenges that individual schools face.
Achieving agreement about how to report learning comparably is vital if we’re to work out how to address the needs of all learners, and ensure lifelong learning and sustainable futures for all.
If part of the role of education is to bring about world peace, another dimension of the assessment question is how to measure domains of learning that foster social-emotional growth and skills for ‘Learning to Live Together’. In many places, these domains are marginalized for the simple reason that they are not assessed. Yet there are many approaches and tools for doing so already in existence. For guidelines, see Booklet #8 in the series of curriculum guidelines on integrating safety, resilience and social cohesion into country level planning and curriculum http://education4resilience.iiep.unesco.org/en/curriculum
Thank you for this interesting post, Aaron. I would like to call attention to a point you made that could easily get lost: “The second issue involves the importance of knowing about the CULTURAL AND LINGUISTIC CONTEXT in which assessments are carried out so as to improve how we interpret differences within and between countries.”
Serious consideration of languages and world views is not only for interpretation of results– it is for assessment itself. Testing in a language that learners do not speak or understand well, especially if their teachers are in a similar situation, is a big waste of time and resources, and I would say does more harm than good, because it causes people to believe that learning can only be communicated and assessed in dominant languages. “Serious consideration” means actually translating assessments into appropriate languages (as Ethiopia does, for example, on its national grade 4 and grade 8 assessments depending on the “nationality language” used as medium of instruction– see e.g. Heugh et al 2011) and ensuring that those translations use the appropriate terminology as well as illustrations/examples that are comprehensible in learners’ contexts. This requires back-translating and piloting so that the instruments actually assess the intended content. (Since this is done on “standardized testing” in high-income countries, why would we not insist on the same standards?)
Where learners’ own languages are not used for teaching and learning, I would argue that NO assessment can be valid, because it will mainly assess the ability to memorize (in the case of curriculum-based testing) and/or luck (in the case of general skills testing). Needless to say, this is highly unfair to individual learners, as well as to schools and to school systems.
I feel that cross-national assessments do to curriculum what the push for universal access did to quality– put the cart before the horse. Why not work toward quality teaching and learning first, then worry about getting children into (good) schools? Then it’s time to assess, to show what learners CAN do.
Thanks for this Education Post