Data integration: How do we measure progress towards SDG 4

By Silvia Montoya, Director, UIS, and Manos Antoninis, Director, GEM Report

Working out how to monitor the ambition in our global education goal, SDG 4, required a certain amount of innovation back in 2015. One of the key suggestions made at the time was that ‘the more data can be combined, the more useful they are’. Data integration, in other words.

Even before the SDGs, other sectors had faced similar challenges to combine different data sources and types together. Wasting and stunting, for instance, which are ways of measuring malnutrition, were calculated thanks to a Joint Child Malnutrition Estimates group in 2011. In health, multiple administrative and survey data sources were combined by the UN Inter-agency Group for Child Mortality Estimation, which created a new model to generate annual estimates for under-5 mortality, and by the Inter-Agency Group for Maternal Mortality Rates.

Which parts of SDG 4 do we monitor with data integration?

Data integration can either involve merging different sources of the same type, or by merging different types of sources. This requires education statisticians to increasingly work out how to incorporate these sources in the estimation of indicators. It is not always simple.

An example of the first is learning outcomes from different assessments, which are the same source, but often have slightly different methodologies. This requires integration in the sense that the results are not immediately comparable and may require further analysis. An example of the second is the out-of-school rate, which can rely on both administrative and survey data, as seen on the VIEW website, or teacher continuous professional development, which can draw on administrative and learning assessment data.

Distribution of SDG 4 global and thematic indicators, by potential data source

Integrating data to monitor completion rates

In 2020, a review of the Inter-agency and Expert Group on SDG Indicators approved the completion rate at three levels of education (primary, lower secondary and upper secondary) as a global indicator. It was one of only six among more than 200 proposals to be successful. Estimating completion rates requires some form of integration in order to be flexible around the fact that many children enrol late, and some may repeat years, especially in poorer countries.

One of the benefits of combining multiple survey data sources is that it can fill in the gaps as a result of infrequent survey cycles or sampling errors. The approach taken borrows from similar solutions in health statistics but has been adapted to the education context. It estimates an underlying trend. Late completion is explicitly modelled by specifying the magnitude of the delay as a function of age. Age misreporting concerns are also addressed. By addressing various data quality concerns associated with survey data, these estimates are also less sensitive to individual surveys, the year in which they were conducted, and the type of survey that happens to be the latest available in a given country.

Combining data sources to estimate out-of-school rates.

The need for a methodology that combines data sources to estimate out-of-school rates was recognized 20 years ago, when it was acknowledged that ‘some sort of composite approach may be needed for estimating time series and producing estimates for the most recent year’.

In the absence of such an approach in the past, measurements were done using enrolment records from school censuses. But there were three challenges with this approach: enrolment records are often incomplete or inaccurate; those records needed to be combined with population estimates, which come from a different and often inconsistent source; and schools were not always able to determine students’ ages accurately.

In recent years, many of these countries have carried out household surveys which, despite their own weaknesses, can help fill some gaps and address challenges related to age and population. A model was accordingly developed to add these sources to the administrative ones in order to get a better picture. The results of this model were reported for the first time In September 2022, and visualized in the VIEW website. Thanks to the new approach, new out-of-school rates were produced for countries such as Nigeria and Ethiopia that hadn’t had data reported for over a decade. The latest data release using this approach has estimated there are still 250 million children, adolescents and youth out of school.

What are the challenges associated with data integration?

When combining data, the methods must be understood so that that they can be accurately interpreted. Even more critically, although these models can only be estimated at a global, central level, it is important to ensure that countries participate in the process and engage with it. This is important not only to make sure they feel ownership of the results but also to help develop the capacity of national statisticians so that they can feed into the model. As things stand, there is no systematic mechanism for countries to seek clarifications, understand the methods underpinning the estimates, contest results that contradict their own understanding of the actual situation, but also proactively contribute data sources and ideas for the development of the models.

There are also technical issues that need to be ironed out and which will also be discussed at this week’s conference, such as how female and male rates should be estimated, and how to align the estimates of the out-of-school and completion rate models.

What further developments are needed?

We need to formalize good practice for the way estimates are reported with guidelines similar to GATHER, an approach followed in health statistics.

We need to build the participation of countries in these new models. Countries should review model results in a systematic way, familiarize themselves with the rationale and implications, identify errors and seek clarifications, contribute ideas to potential areas of model development, and provide additional and up-to-date data sources. Familiarizing ministries of education and the expert community with estimate-based SDG 4 indicators as a new way of monitoring progress requires extensive communication. The suggestion is that the same approach as in health is taken in education so that the UNESCO Institute for Statistics covers the models in the workshops it is already running with countries. An inventory of surveys that will support data integration, ensuring countries are involved in the data inputs used, is also recommended.

A joint model combining the out-of-school and completion rates should be developed. The GEM Report and the UIS are currently working to develop a model that integrates the completion and out-of-school rate estimates to ensure they are consistent with each other.

New ways of integrating data should be considered to estimate other elements of SDG 4. The suggestion is that similar models could be used that draw in data, for instance on children who are over-age for their grade, those learners in non-formal education, and more.