Compare, align, track: The foundational learning data challenge

By Silvia Montoya, Director of the UNESCO Institute for Statistics, and Luis Crouch, 1st Vice-chair UIS Governing Board

The SDGs provide an impetus to use or develop high quality assessment programs for reporting. The language of the SDGs requires that “Global monitoring should be based, to the greatest possible extent, on comparable and standardized national data…”. SDG global indicator 4.1.1, ‘Proportion of children and young people (a) in grades 2/3; (b) at the end of primary; and (c) at the end of lower secondary achieving at least a minimum proficiency level in (i) reading and (ii) mathematics, by sex’, is focused on examining learning progression from foundational through early secondary years using globally accepted benchmarks in learning areas universally accepted as critical.

In the build-up to the SDGs, and soon after, the UNESCO Institute for Statistics (UIS) and others in the education community were presented with a dilemma with two apparent extremes, both of which had advocates. On the one hand, there were those who supposed that a single global assessment could be developed and applied. For instance, long before the SDGs and even the MDGs, UNESCO entertained such an idea in the 1990s MLA project).

On the other hand, other members of the community advocated that every country (or perhaps region) should measure learning entirely on its own and that measurements that allow for comparisons are inherently misleading. The UIS took the much more difficult middle road because it judged it was the only technically correct and politically wise way to go.

The task implied the creation of a set of global standards in a time-consuming but necessary process. This would make it possible for disparate assessments to refer to standards without having to carry out the same assessment. The following suite of tools would make this possible:

Despite these and other UIS efforts to make available supportive tools, such as information about assessment costs, fitness for purpose, calibrated modules such as AMPL, and a growing bank of items that countries can use, the availability of learning outcomes data remains low. There are financial and technical reasons but also, importantly, functioning of the learning assessment market remains inefficient and inequitable: there is insufficient clarity of alternatives to donors and, especially, to countries, which prevents informed choice. It is unlikely that those who participate in the assessment industry will, collectively or as individual actors, solve what are essentially ecosystem problems of coordination and public good delivery.

Despite the progress data coverage for 4.1.1a is not enough

SDG indicator 4.1.1 is being reported using various cross-national studies that are international (PIRLS, TIMSS) or regional (PILNA, SEA-PLM, PASEC, LLECE, SACMEQ) and share a single tool for participating countries. These tools have not been designed for SDG reporting but, in 2018, the Global Alliance to Monitor Learning (GAML) and the Technical Cooperation Group on SDG 4 indicators (TCG) agreed that these assessments could be used to report learning based on their proficiency levels that “mapped” best to the global MPL. For example, the MPL for indicator 4.1.1a on reading in Grades 2 or 3 is defined as follows:

  • Students accurately read and understand written words from familiar contexts.
  • They retrieve explicit information from very short texts.
  • When listening to slightly longer texts, they make simple inferences.

Information about national and other assessment programs can be seen in the inventory of learning assessments. For early grades only two regional assessments (PASEC in Africa and ERCE in Latin America) can be used to report learning, while a recent effort by the UIS, AMPLa, is just finalizing its piloting phase with four countries in Africa and one in Asia (India). there are many more assessments at the end of primary and end of lower secondary. The methods for using other assessments to report are not yet reliably smooth or adequate.

Assessment programs by grade or age and use for reporting on SDG indicator 4.1.1

Grade International assessment program
SDG 4.1.1a: Early grades
SDG 4.1.1b: End of primary
SDG 4.1.1c: End of lower secondary
15 years PISA
5-16 years ASER, UWEZO
7-14 years MICS


Note: Assessments in bold are used to report on SDG indicator 4.1.1.

The production of comparable learning outcomes is not progressing fast and equally enough. Regardless of the coverage criterion (number of countries or population), coverage is much higher at the end of primary and end of lower secondary than for grades 2 or 3.

Coverage of learning assessments, by level of education

What can be done to improve reporting in the early grades?

Other early grade assessments that have been applied globally cannot be used for global reporting, mostly because they were not intended to generate comparable data: the Early Grade Reading/Mathematics Assessment (EGRA/EGMA), the PAL Network citizen-led assessments, and UNICEF’s Foundational Learning Module of its MICS household survey).

First, they were designed for national diagnosis, advocacy, program design, program tracking and evaluation. However, over time they have been used in hundreds of country/language combinations, often with suitable adaptability to the orthography of the languages, which the large international and regional assessments do not do. This is important, as in grades 2 and 3 the language of testing makes a lot of difference.

Second, EGRA/EGMA and PAL Network assessments are often administered to population sub-groups, and the samples used were not meant to be nationally representative. By contrast, international and regional assessments use complex samples to make inferences at national level.

Third, sampling, variance calculations for cluster sampling, data custodianship, supervision, data audits and other administration aspects of EGRA/EGMA, PAL Network and MICS assessments are not always well documented in a single place. If there is no central and clear documentation of procedures, it is difficult to know whether procedures were followed. By contrast, international and regional assessments centralize and regularize documentation, making it easily accessible.

Nevertheless, these assessments have the potential to be used for reporting. Where the sample was sub-national, inferences could be made for the whole population. Where the measurement was one-off, it may be possible to repeat the assessment and link them to measure progress over time. Even the fact that their links to the MPL have not been clear can be solved with further work, either by making additions to these assessments or elaborating the MPL with sub-skills to allow linking. In a nutshell, a relationship can be established between the umbrella areas of EGRA, the PAL Network assessments or the MICS, (i.e., fluency, accuracy, and comprehension), and the MPL using methods that take each language’s features into account.

But these issues arise from a deeper and more fundamental issue: the inefficiency of the assessment market. The assessments were designed by NGOs, non-profits and a UN agency for specific purposes that fitted a specific niche or market demand. These purposes might not have required that samples be national. Or they may not have required extensive documentation. Each one now has a certain inertia. Given the paucity of official, reported data handled by the UIS, it would be too bad if the energy and richness they represent could not be used for increasing reporting substantively.

Bringing together these disparate efforts to assess learning requires an arbiter or broker to compare the various assessments, suggest ways to improve the documentation, and commission further research into how to link to the MPL. The only known arbiter or broker for this kind of work is the UIS and the TCG processes it has set up.


Leave a Reply