On the way forward for SDG indicator 4.1.1a: our proposal

By Silvia Montoya, Director of the UNESCO Institute for Statistics

There has been a lot of discussion in recent months on global SDG indicator 4.1.1a on learning outcomes in early grades. As few countries have been able to report on it, its future is at risk ahead of the upcoming comprehensive review of the SDG indicators. We have explained in a recent blog, the technical reasons that have delayed consensus and how we are working to overcome them. This blog attempts to sketch the outline of a sustainable solution.

An argument that has been voiced in recent days is that the perfect should not be the enemy of the good. A ‘good’ indicator must convey reliable, comparable information on learning outcome levels and trends – and be based on blocks of information that can guide policy and planning.

In order for indicator 4.1.1a data are trustworthy, the following inputs are required:

What content, or domains, does it include to make one assessment comparable to others?
What are the minimum standards per domain and how should performance in each domain be aggregated to allow estimation of the share of students achieving the minimum proficiency level?
What data collection procedures ensure quality?

Various cross-national and national assessments lay claims to be contenders for reporting on indicator 4.1.1a. Yet assessment tools have been designed to serve different objectives in various education contexts, with global comparability not necessarily being one such objective.

Cross-national assessments may not even have been designed to be comparable – and, even when comparable, they tend not to measure the SDG 4 minimum proficiency level. In 2018, there was a consensus decision to use one of their own proficiency level definitions to report, as a proxy for the global minimum proficiency level. Cross-national assessments are not necessarily aligned with national curricula, especially in the case of countries that were not among those that established the assessment programme in the first place.

National assessments can potentially be used as a basis for reporting as long as further work is done to align with the minimum proficiency level and ensure the quality of the assessment procedure. One hybrid approach is to include within a national assessment enough questions that are designed to measure the global minimum proficiency level. This route would be sustainable, as it respects national authority and priorities, it is targeted to local contexts, and it can provide evidence related to the national curriculum, pedagogy, and policy. It also addresses concerns related to content and procedural capacity, building local capacity in assessment development, analysis and reporting, while ensuring compliance with the minimum proficiency level.

Reporting on proficiency in lower primary grades: the issues

Measurement in lower primary (compared to end of primary and end of lower secondary) introduces at least two extra types of technical complexity:

The young age of the children means that two different types of administering the assessment have emerged: individual child vs in-classroom group assessment.
The higher occurrence of teaching and learning in mother tongue means that assessments are more likely to be administered in multiple languages, which requires parameters in these languages to be developed to ensure comparability.

As always there are pros and cons with the type of assessment used. The context-appropriate choice should be a country decision. The challenge is how to accommodate both types of assessment in the reporting to make them comparable.

Group-administered tests are more cost-effective and appropriate to some contexts to guide policymaking if most children are around or above the minimum proficiency level. However, they do not allow the identification of problems at the more foundational level skills and might therefore be less useful for policy in counties where most of the children lack precursor skills.

To illustrate, consider the minimum proficiency level requirements to report indicator 4.1.1a. A test that uses 20 items in reading comprehension (which is the desirable outcome, enables children to pass from the ‘learning to read’ to the ‘reading to learn’ stage) is suitable to report indicator 4.1.1a. Likewise, an assessment that reliably identifies the ultimate outcomes by including 10 reading comprehension items and the 10 items on precursor skills to reading (oral language comprehension and decoding) would also be suitable to report. The choice of participation in the full range of skills is also a matter for a country to choose as part of a national measurement strategy and with reference to its context.

Unpacking the minimum proficiency level for SDG indicator 4.1.1a – Reading

In practice, various approaches to assessment in lower primary grades are available for countries to choose from. Some measure more foundational level (Assessment 1) and, therefore, cannot report at the minimum proficiency level but can do so at the foundational skill level. Others measure partially the ultimate skill, i.e. reading comprehension, (Assessments 2 and 3). Yet others (Assessments 4 and 5) measure just the minimum proficiency level and higher order skills.

Setting global standards for various assessments

The pending question would be how to include in reporting assessments that are not testing reliably the ultimate outcome (reading comprehension) but are testing foundational and/or precursor skills.

Reporting on proficiency in lower primary grades: a solution

Given these alternative scenarios and in order to achieve two objectives (reporting and guide improvement in assessment tools), the UIS is proposing a reporting scheme that relies on disaggregating or ‘unpacking’ the reporting of the minimum proficiency level by skill or subskill and to allow partial reporting if the assessment is not measuring the minimum proficiency level.

The unpacking of reporting has several advantages. In particular, it would:

accommodate existing tools, as long as they meet procedural quality criteria;
allow measuring the minimum proficiency level using assessments that have different test composition and even different type of administration; and
facilitate the definition of the desirable standards.

This last point reminds us that more effort is needed as multiple standards for each skill/subskill and language need to be defined.

With reference to the five assessment program example in the figure above, this table shows how reporting would look like for reading, once additional work on standards has been completed. It would be possible to have some assessment programs reporting for foundational and/or precursor level skills (but not the minimum proficiency level) if the program does not currently cover that level.

***Disaggregation of reporting by skills* *for SDG indicator 4.1.1a – Reading***

What would the final steps be?

A few steps would need to be completed in the upcoming three months to complete the definition of eligibility criteria:

Define the minimum acceptable technical parameters to combine different skills to report for the minimum proficiency level.
Set the standards for each precursor skill by language or language group.
Develop an aggregation or scoring method to report to the minimum proficiency level – and for subskills (e.g. foundational skills) if desired.

Once these inputs have been completed, it is expected that reporting by national and international assessment programs could expand. This development would open opportunities for reporting also from national assessment programs and support countries’ choices. It would also allow decisions to be made on what would be the desired frequency that would be useful and meaningful for policy. Reporting on mathematics would also be expected to follow the same procedure.