Salvador Urban Network Transportation (SUNT): A Landmark Spatiotemporal Dataset for Public Transportation – Nature

Salvador Urban Network Transportation (SUNT): A Landmark Spatiotemporal Dataset for Public Transportation – Nature

 


Report on the Salvador Urban Network Transportation (SUNT) Dataset and its Alignment with Sustainable Development Goals

Executive Summary: The SUNT Dataset as a Catalyst for Sustainable Urban Development

Efficient public transportation is a cornerstone of sustainable urban development, directly impacting several of the United Nations’ Sustainable Development Goals (SDGs). This report details the Salvador Urban Network Transportation (SUNT) dataset, a comprehensive, openly available resource designed to address critical challenges in urban mobility. The dataset, collected in Salvador, Brazil, provides a robust foundation for data-driven policymaking and research aimed at creating inclusive, safe, resilient, and sustainable cities. By offering detailed insights into passenger and vehicle dynamics, SUNT enables the development of transportation systems that reduce inequality, improve environmental quality, and foster economic growth, aligning with the core principles of the 2030 Agenda for Sustainable Development.

1.0 Introduction: Urban Mobility and the Sustainable Development Goals

The management of public transportation is intrinsically linked to achieving global sustainability targets. Inefficient systems, particularly in developing nations, exacerbate urban challenges by limiting access for low-income populations, increasing traffic congestion, and contributing to environmental degradation. This directly contravenes the objectives of several SDGs. The SUNT dataset was developed to provide the empirical evidence needed to address these issues and advance the following goals:

  • SDG 11 (Sustainable Cities and Communities): By providing data to create safe, affordable, accessible, and sustainable transport systems for all.
  • SDG 13 (Climate Action): By enabling optimization strategies that reduce fuel consumption and greenhouse gas emissions from the transport sector.
  • SDG 10 (Reduced Inequalities): By ensuring public transit, often the only option for low-income groups, is efficient and serves all communities equitably.
  • SDG 9 (Industry, Innovation, and Infrastructure): By fostering innovation in Intelligent Transportation Systems (ITS) and supporting the development of resilient urban infrastructure.
  • SDG 3 (Good Health and Well-being): By reducing air pollution and traffic-related stress, and improving road safety.

The SUNT dataset, one of the largest of its kind, covers the integrated public transportation network of Salvador, Brazil—a city with nearly 3 million residents. It includes high-frequency data from regular buses, Bus Rapid Transit (BRT), and subway systems, capturing the movements of approximately 700,000 passengers daily. This report outlines the dataset’s methodology, technical validity, and transformative potential for achieving sustainable urban futures.

2.0 Methodology for a Sustainable Impact

The creation of the SUNT dataset involved a multi-stage process designed to produce a reliable and comprehensive tool for transportation planning and research, with a clear focus on generating actionable insights for sustainability.

2.1 Data Collection and Sources

Data was gathered from four primary systems to create a holistic view of the urban transit network:

  1. Automatic Vehicle Location (AVL): Provided real-time geospatial and temporal data for approximately 2,000 vehicles across nearly 400 lines, crucial for monitoring operational efficiency and environmental impact (SDG 11, SDG 13).
  2. Automatic Fare Collection (AFC): Recorded anonymized passenger payment data from buses, BRT, and subway systems, offering insights into travel patterns and demand, which is vital for ensuring equitable access (SDG 10).
  3. General Transit Feed Specification (GTFS): Supplied static schedule and network data, defining the structural backbone of the transport system, including stops, routes, and service times.
  4. Local Trip Information (LTI): Contained details on expected and actual trip times, complementing AVL data to ensure a complete record of vehicle activity.

2.2 Data Integration via Trip Chaining

A significant challenge in public transport analysis is the lack of alighting data. The SUNT project addressed this by implementing a “Trip Chaining” methodology. This process integrates AVL and AFC data to infer passenger journeys:

  • Boarding Point Identification: The system matches the time and location of a fare payment (AFC) with a vehicle’s position (AVL) to determine the boarding stop.
  • Alighting Point Inference: A passenger’s alighting point from one trip is inferred from the boarding point of their subsequent trip. This creates a complete Origin-Destination (OD) matrix, which is fundamental for understanding travel behavior and planning services that meet community needs (SDG 11).
  • Data Validation: Thresholds for time (5-20 minutes) and walking distance (1.1 km) were established based on local conditions to ensure the validity of inferred trips, filtering out inconsistent records.

2.3 Graph Modeling for Network Analysis

The processed OD data was modeled as a spatio-temporal graph, G = {V, E}, where:

  • Vertices (V): Represent the 2,871 bus stops and stations. Node attributes include passenger boarding/alighting counts and vehicle load, which are essential for identifying high-demand areas and potential infrastructure bottlenecks.
  • Edges (E): Represent the 4,526 feasible routes between stops. Edge attributes include distance, trip duration, and average vehicle velocity, enabling analysis of traffic flow and congestion to inform strategies for reducing emissions (SDG 13).

This graph structure provides a powerful framework for applying advanced machine learning techniques to optimize the network for sustainability, efficiency, and equity.

3.0 Technical Validation and Relevance to SDG Targets

The integrity of the SUNT dataset was rigorously validated to ensure its suitability for evidence-based decision-making in transportation planning.

3.1 Statistical and Temporal Validation

Analysis of key metrics such as passenger boardings, alightings, and vehicle loads confirmed that the data distributions align with expected real-world patterns, including skewed distributions indicating high activity at a few central hubs. Temporal analysis revealed clear daily and weekly cycles, with distinct rush-hour peaks and lower traffic on weekends. This validation confirms that SUNT accurately reflects the city’s rhythm, making it a reliable tool for planning services that match demand, thereby improving efficiency and reducing resource waste (SDG 11.2).

3.2 Spatial Validation using Graph Neural Networks (GNNs)

To demonstrate the dataset’s utility for advanced spatial analysis, classification tasks were performed using GNNs to predict passenger loading (node classification) and average velocity (edge classification). The models achieved satisfactory performance (accuracy >60%), validating that the graph structure captures meaningful spatio-temporal relationships. This capability is critical for route optimization, which can lead to reduced travel times and lower fuel consumption, directly supporting climate action goals (SDG 13).

3.3 Transportation Planning Validation

The dataset is already being used to inform practical transportation planning decisions that align with sustainability objectives:

  • Optimizing Vehicle Capacity: By analyzing maximum passenger loads at different times of the day, planners can calculate the required number of vehicles to meet demand without causing overcrowding or running empty buses. This enhances service quality and operational efficiency.
  • Improving Timetables: The data enables the strategic designation of “Normal” and “Express” services, reducing unnecessary stops, saving fuel, and minimizing passenger waiting times. This contributes to a more efficient and less polluting transport system (SDG 11, SDG 13).

3.4 Cross-Dataset Validation for Broader Urban Insights

SUNT was successfully integrated with a public dataset of municipal schools in Salvador. By correlating student transportation card usage with school locations, it was possible to validate passenger load patterns around educational institutions. This demonstrates SUNT’s potential as a foundational layer for broader smart city applications, such as planning safe and accessible routes to schools and other essential services, thereby fostering inclusive and equitable communities (SDG 10, SDG 11).

4.0 Conclusion: Future Directions for a Sustainable Planet

The SUNT dataset is more than a collection of data; it is a strategic asset for advancing the Sustainable Development Goals. By providing an unprecedented level of detail on urban mobility, it empowers researchers, policymakers, and transit agencies to move from reactive management to proactive, data-driven planning.

The public availability of SUNT fosters global collaboration (SDG 17) and opens avenues for innovative research, including:

  • Developing advanced machine learning models for traffic forecasting and anomaly detection.
  • Creating multi-objective optimization algorithms to balance efficiency, cost, and environmental impact.
  • Simulating policy changes to assess their impact on equity and sustainability before implementation.
  • Fine-tuning foundation models for use in other cities that lack comprehensive data, democratizing the tools for sustainable urban planning.

By leveraging the SUNT dataset, cities can design public transportation systems that are not only efficient but also equitable and environmentally responsible, making a tangible contribution to building a sustainable future for all.

Analysis of Sustainable Development Goals in the Article

1. Which SDGs are addressed or connected to the issues highlighted in the article?

The article on the Salvador Urban Network Transportation (SUNT) dataset addresses and connects to several Sustainable Development Goals (SDGs) by focusing on improving urban mobility, reducing environmental impact, promoting social equity, and fostering innovation in infrastructure.

  • SDG 9: Industry, Innovation and Infrastructure: The article’s core is the creation and application of a novel, comprehensive dataset (SUNT) and the use of advanced technologies like Machine Learning (ML), intelligent transportation systems (ITS), and Graph Neural Networks (GNNs) to improve public transportation infrastructure. This aligns with building resilient infrastructure and fostering innovation.
  • SDG 10: Reduced Inequalities: The article explicitly states that in developing countries, public transport is “often the only means of transport available to low-income populations.” By providing data to create a more efficient, affordable, and accessible system, the work aims to improve mobility for vulnerable populations, thereby reducing inequality in access to urban services and opportunities.
  • SDG 11: Sustainable Cities and Communities: This is the most central SDG in the article. The entire project is geared towards making urban transport systems in Salvador, Brazil, more sustainable. It directly discusses providing “comprehensive coverage of population mobility,” reducing “traffic congestion,” improving safety, and making transport more affordable and accessible.
  • SDG 13: Climate Action: The article repeatedly highlights the environmental benefits of efficient public transportation, such as the “significant reduction of environmental impact limiting gas emissions and pollution” and “reducing… carbon emissions.” Improving public transit is a key strategy for climate change mitigation in urban areas.

2. What specific targets under those SDGs can be identified based on the article’s content?

Based on the issues and solutions discussed, several specific SDG targets can be identified:

  1. Target 11.2: By 2030, provide access to safe, affordable, accessible and sustainable transport systems for all, improving road safety, notably by expanding public transport, with special attention to the needs of those in vulnerable situations, women, children, persons with disabilities and older persons.
    • Explanation: The article’s primary goal is to improve public transportation to provide “comprehensive coverage,” reduce “transport costs,” and enhance the “trip experiences.” It specifically mentions its importance for “low-income populations” and discusses data related to passengers “aged 65 or older.”
  2. Target 11.6: By 2030, reduce the adverse per capita environmental impact of cities, including by paying special attention to air quality and municipal and other waste management.
    • Explanation: The article explicitly states that efficient public transport leads to a “significant reduction of environmental impact limiting gas emissions and pollution” and helps in “reducing traffic jams and carbon emissions.”
  3. Target 9.1: Develop quality, reliable, sustainable and resilient infrastructure, including regional and transborder infrastructure, to support economic development and human well-being, with a focus on affordable and equitable access for all.
    • Explanation: The SUNT dataset is presented as a tool to model and improve the quality, reliability, and efficiency of the public transport infrastructure in Salvador. The article discusses optimizing the system to avoid “delayed and overloaded vehicles” and improve overall service quality.
  4. Target 9.5: Enhance scientific research, upgrade the technological capabilities of industrial sectors in all countries, in particular developing countries, including, by 2030, encouraging innovation and substantially increasing the number of research and development workers per 1 million people and public and private research and development spending.
    • Explanation: The development and public sharing of the SUNT dataset is an act of enhancing scientific research. The paper states its significance lies in “advancing scientific research” and providing a resource for researchers to “develop and evaluate a wide range of data-driven methods.”
  5. Target 10.2: By 2030, empower and promote the social, economic and political inclusion of all, irrespective of age, sex, disability, race, ethnicity, origin, religion or economic or other status.
    • Explanation: By focusing on improving a service that is the “primary means of transport for low-income populations,” the work directly contributes to their social and economic inclusion by providing better and more affordable access to jobs, education, and other services across the city.

3. Are there any indicators mentioned or implied in the article that can be used to measure progress towards the identified targets?

Yes, the article mentions and implies numerous quantitative and qualitative indicators that can be used to measure progress.

  • For Target 11.2 (Access to Sustainable Transport):
    • Passenger Volume and Coverage: The dataset includes “daily information from about 700,000 passengers,” data on “approximately 400 lines, connecting almost 3,000 stops and stations,” and details on “boarding and alighting.” This can be used to measure the proportion of the population with convenient access to public transport.
    • Travel and Wait Times: The dataset contains attributes like “trip_time,” “wait_time,” and “walk_time,” which are direct indicators of the system’s accessibility and efficiency.
    • Transport Cost: The AFC (Automatic Fare Collection) data includes the “value” (trip cost), which is a direct measure of affordability.
    • Passenger Load: Metrics like “passenger loading” and “max load” help assess vehicle overcrowding and service quality.
  • For Target 11.6 (Reduce Environmental Impact):
    • Reduced Gas Emissions: While not measured directly, the article’s goal of “reducing traffic jams and carbon emissions” implies that the dataset can be used to model and estimate reductions in emissions by optimizing routes and reducing the number of vehicles needed.
    • Traffic Congestion: The article mentions “better control of traffic congestion” as a key benefit. Indicators like “average_speed” and “trip duration” from the dataset can be used to measure changes in congestion levels.
  • For Target 9.1 (Quality Infrastructure):
    • System Reliability: Data on “delayed and overloaded vehicles,” “trip duration,” and “average velocity” serve as indicators of the transport system’s reliability and quality.
    • System Efficiency: The “Renovation Factor (RF)” is mentioned as a “metric used in transportation research to assess the total demand in a line,” which is a key performance indicator for infrastructure efficiency.

4. Summary Table of SDGs, Targets, and Indicators

SDGs Targets Indicators Identified in the Article
SDG 11: Sustainable Cities and Communities 11.2: Provide access to safe, affordable, accessible and sustainable transport systems for all.
  • Number of passengers served daily (approx. 700,000).
  • Number of transport lines (approx. 400) and stops/stations (approx. 3,000).
  • Data on boarding/alighting, travel time, waiting time, and walking distance.
  • Trip cost data from AFC system.
  • Passenger load and max load metrics to assess overcrowding.
SDG 11: Sustainable Cities and Communities 11.6: Reduce the adverse per capita environmental impact of cities.
  • Stated goal of reducing “gas emissions and pollution” and “carbon emissions.”
  • Metrics for measuring traffic congestion (average speed, trip duration).
SDG 9: Industry, Innovation and Infrastructure 9.1: Develop quality, reliable, sustainable and resilient infrastructure.
  • Data on delayed and overloaded vehicles.
  • Metrics for system efficiency and quality (average speed, trip time, Renovation Factor).
SDG 9: Industry, Innovation and Infrastructure 9.5: Enhance scientific research and encourage innovation.
  • The creation and public sharing of the SUNT dataset itself.
  • Application of advanced methods (ML, GNNs, ITS) for analysis.
SDG 10: Reduced Inequalities 10.2: Promote the social and economic inclusion of all.
  • Focus on improving transport for “low-income populations.”
  • Data on transport costs to measure affordability.
  • Analysis of mobility for specific groups (e.g., students, elderly).
SDG 13: Climate Action 13.2: Integrate climate change measures into national policies, strategies and planning.
  • The project’s aim to inform public policy for transportation planning that inherently reduces carbon emissions.

Source: nature.com