Annual Social and Economic Supplement – Economist Writing Every Day

Annual Social and Economic Supplement – Economist Writing Every Day

 

Report on the Release of a Historical State-Level Demographic Dataset for Sustainable Development Goal (SDG) Analysis

Introduction

A new panel dataset detailing the historical demographics of U.S. states from 1962 to 2024 has been compiled and made publicly available. Derived from the Current Population Survey’s (CPS) Annual Social and Economic Supplement, this resource is designed to facilitate empirical research and policy analysis, with a significant focus on tracking progress toward the United Nations Sustainable Development Goals (SDGs). The dataset provides accessible, state-year level data crucial for researchers and students, aiming to streamline the process of acquiring control variables for empirical studies.

Dataset Overview and Relevance to SDGs

The dataset contains key demographic variables that serve as critical indicators for a range of SDGs. The availability of this data in a clean, longitudinal format enhances the capacity for robust analysis of socio-economic trends and inequalities.

  • Income: Directly supports analysis for SDG 1 (No Poverty) and SDG 10 (Reduced Inequalities) by allowing for the examination of income distribution and poverty levels across states and over time.
  • Education: Essential for monitoring SDG 4 (Quality Education), tracking educational attainment rates and disparities among different demographic groups.
  • Health Insurance Coverage: A key metric for SDG 3 (Good Health and Well-being), providing insight into access to healthcare services.
  • Sex and Marital Status: Enables research into SDG 5 (Gender Equality) by facilitating the analysis of gender-based disparities in economic and social outcomes.
  • Age and Race: These variables are fundamental for disaggregated analysis under SDG 10 (Reduced Inequalities), allowing researchers to investigate how outcomes vary across different population segments.
  • Employment-Related Variables: The data supports analysis related to SDG 8 (Decent Work and Economic Growth).

Advancing the Sustainable Development Agenda

This dataset is a practical tool for advancing evidence-based policymaking aligned with the SDG framework.

SDG 10: Reduced Inequalities

The primary strength of the dataset is its utility for studying inequality. By providing disaggregated data on income, education, and health by race, sex, and age at the state level, it allows for a granular investigation into the drivers of inequality, informing targeted policy interventions.

SDG 4 & 5: Quality Education and Gender Equality

Researchers can leverage the dataset to model the long-term impacts of state-level policies on educational attainment and gender gaps in income and labor force participation, providing critical feedback for achieving the targets set by SDG 4 and SDG 5.

SDG 16 & 17: Strong Institutions and Partnerships for the Goals

The public release of this pre-cleaned dataset exemplifies SDG 17 (Partnerships for the Goals) by contributing to the global data ecosystem. It promotes transparency and accessibility, which are core principles of SDG 16 (Peace, Justice and Strong Institutions), by empowering a wider range of stakeholders, including students and smaller research institutions, to engage in high-quality, data-driven analysis without prohibitive data processing costs.

Comparative Advantage Over Existing Data Sources

The dataset addresses common challenges researchers face when sourcing demographic data for SDG-related analysis.

  1. Comprehensive Coverage: Unlike other sources with more limited timeframes, this dataset extends from 1962 to 2024, offering a long-term perspective on demographic and socio-economic trends.
  2. Ease of Access: It consolidates data from numerous individual CPS files into ready-to-use Stata and Excel formats, overcoming the navigational difficulties of the main Census website and the data export limitations of tools like IPUMS.
  3. Focused Variables: It provides a curated set of variables essential for SDG analysis, avoiding the extraneous information found in broader datasets like the National Welfare Data.

Data Limitations and Methodological Recommendations

Users of the dataset should be aware of specific limitations to ensure methodologically sound analysis.

Data Inconsistencies

  • Geographic Coverage: Prior to 1977, data for some states is either missing or aggregated with neighboring states.
  • Variable Coding: The coding methodologies for certain variables have changed over time. Notably, the ‘race’ variable was recoded in 2003, and the ‘age’ variable has undergone several changes to its universe and top-coding.

Analytical Guidance

Due to these temporal inconsistencies, the dataset is better suited for cross-sectional comparisons between states within a given year rather than for measuring national trends over the entire period. For regression analyses, it is strongly recommended to control for year fixed effects to mitigate the impact of these data changes. The data cleaning code is publicly available to ensure full transparency and reproducibility of the research process, aligning with the principles of open science and SDG 17.

Sustainable Development Goals (SDGs) Addressed in the Article

  • SDG 4: Quality Education

    The article directly connects to quality education by providing a resource intended to help “students who might not be able to make one themselves.” The author aims to facilitate “empirical project[s]” for students by offering an easy-to-use dataset, thereby supporting educational and research activities.

  • SDG 10: Reduced Inequalities

    This is a core theme, as the dataset contains key demographic variables—”age, race, sex, marital status, income, education”—that are essential for studying and measuring disparities. The data allows researchers to analyze how outcomes differ across various population groups, which is fundamental to understanding and addressing inequality.

  • SDG 16: Peace, Justice and Strong Institutions

    The article addresses the principle of public access to information. The author notes that official sources like the “Census’ website is difficult to navigate” and takes action to make public data more accessible. By cleaning the data and sharing it publicly, the author strengthens the practical application of ensuring the public can access and use information held by public institutions.

  • SDG 17: Partnerships for the Goals

    The article is an example of a partnership for knowledge sharing. The author makes the dataset and “cleaning code” available to the entire research community (“researchers who would otherwise spend a couple hours re-doing the same work”). This act of sharing data and tools enhances the capacity of others to conduct research that can inform progress on various SDGs.

Specific SDG Targets Identified

  • Target 10.2: Empower and promote the social, economic and political inclusion of all, irrespective of age, sex, disability, race, ethnicity, origin, religion or economic or other status.

    The dataset is explicitly designed to support analysis across the dimensions mentioned in this target. The article lists variables for “age, race, sex, income, education,” which are the necessary components to measure the economic inclusion or exclusion of different groups.

  • Target 16.10: Ensure public access to information and protect fundamental freedoms, in accordance with national legislation and international agreements.

    The author’s entire project is a direct response to the difficulty of accessing public data, as mentioned in the article: “Census’ website is difficult to navigate and mostly offers its data one year at a time.” By creating and sharing a clean panel dataset, the author is actively promoting and facilitating public access to information.

  • Target 17.18: By 2020, enhance capacity-building support to developing countries… to increase significantly the availability of high-quality, timely and reliable data disaggregated by income, gender, age, race, ethnicity…

    Although the article focuses on US data, it perfectly exemplifies the principle of this target. The author has created a “high-quality, timely and reliable” dataset (covering 1962-2024) that is disaggregated by the exact characteristics listed in the target: “age, race, sex… income, education.” This action increases the availability of such data for researchers and students.

Indicators Mentioned or Implied

  • Implied Indicators for Inequality (SDG 10)

    The article implies the use of indicators that measure inequality by providing the necessary data. The variables “income,” “race,” “sex,” and “education” allow for the calculation of various disparity metrics, such as income distribution by race or the gender gap in educational attainment. These are used to measure progress towards targets like 10.2.

  • Implied Indicators for Education (SDG 4)

    The “education” variable in the dataset allows for the measurement of educational attainment levels across different demographic groups (e.g., by sex, race, income). This can be used to construct indicators related to Target 4.5 (eliminate gender disparities in education) and Target 4.3 (ensure equal access to tertiary education).

  • Implied Indicators for Health (SDG 3)

    The inclusion of “health insurance” as a variable directly implies indicators related to Target 3.8 (Achieve universal health coverage). The percentage of the population with health insurance, disaggregated by the other variables in the dataset (age, race, income), is a key indicator for measuring access to healthcare services.

  • Practical Fulfillment of an Indicator for Public Access (SDG 16)

    The article’s content serves as a practical example related to **Indicator 16.10.2 (Number of countries that have adopted and implemented… guarantees for public access to information)**. The author’s work is a direct implementation of this principle, demonstrating a non-governmental effort to make public information genuinely accessible and usable, thereby fulfilling the spirit of the indicator.

SDGs, Targets, and Indicators Analysis

SDGs Targets Indicators (Identified or Implied)
SDG 3: Good Health and Well-being 3.8: Achieve universal health coverage. Implied: Health insurance coverage rates, disaggregated by age, race, and income, based on the “health insurance” variable in the dataset.
SDG 4: Quality Education 4.5: Eliminate gender disparities in education and ensure equal access. Implied: Parity indices for educational attainment, calculated using the “education,” “sex,” and “race” variables.
SDG 10: Reduced Inequalities 10.2: Empower and promote the social, economic and political inclusion of all. Implied: Measures of income distribution (e.g., Gini coefficient) and poverty/income levels disaggregated by “age, race, sex” as provided in the dataset.
SDG 16: Peace, Justice and Strong Institutions 16.10: Ensure public access to information. Identified: The act of creating and sharing the dataset is a practical fulfillment of policies for public access to information, as described in the article’s motivation.
SDG 17: Partnerships for the Goals 17.18: Increase the availability of high-quality, timely and reliable disaggregated data. Identified: The dataset itself is a “high-quality, timely and reliable” source of data disaggregated by “age, race, sex, income, education,” directly matching the indicator’s requirements.

Source: economistwritingeveryday.com