An RGB-TIR Dataset from UAV Platform for Robust Urban Traffic Scenes Semantic Segmentation – Nature
Report on the Kust4K Dataset for Advancing Sustainable Urban Development
Abstract
This report details the Kust4K dataset, a novel Unmanned Aerial Vehicle (UAV)-based resource for RGB-Thermal Infrared (TIR) multimodal semantic segmentation. The dataset was developed to address critical limitations in existing data resources, thereby accelerating research and innovation in line with Sustainable Development Goal 9 (Industry, Innovation, and Infrastructure). By providing 4,024 high-quality, pixel-aligned RGB-TIR image pairs of diverse urban road scenes, Kust4K facilitates the development of robust intelligent transportation systems. Such systems are essential for creating safer, more resilient, and sustainable urban environments, directly contributing to Sustainable Development Goal 11 (Sustainable Cities and Communities). Extensive experiments confirm that multimodal training significantly enhances semantic segmentation reliability, underscoring the dataset’s value in advancing robust urban scene understanding for sustainable development.
1.0 Introduction: Aligning Technological Innovation with Sustainable Development Goals
1.1 The Urbanization Challenge and SDG 11
Rapid global urbanization presents significant challenges to urban infrastructure, particularly transportation systems. Inefficient traffic management compromises safety, increases pollution, and hinders economic productivity, creating barriers to achieving SDG 11, which aims to make cities inclusive, safe, resilient, and sustainable. Traditional monitoring methods, such as fixed cameras, are often inadequate for the dynamic and complex nature of modern urban environments, failing to provide the comprehensive data needed for effective management.
1.2 Fostering Innovation for Resilient Infrastructure (SDG 9)
UAV remote sensing technology represents a significant innovation for urban monitoring. Its flexibility and broad coverage offer a cost-effective solution for gathering high-resolution data. The integration of multimodal sensors, specifically RGB and TIR, further enhances this capability. This technological advancement aligns with SDG 9 by providing a foundational tool for building resilient infrastructure and fostering innovation in intelligent transportation. The complementary nature of RGB (color and texture) and TIR (thermal radiation) data allows for robust perception in variable conditions, such as low illumination, which is critical for ensuring 24/7 operational reliability for urban safety and traffic management systems.
1.3 Gaps in Existing Data Resources
The development of effective, data-driven models for urban scene understanding has been hampered by limitations in existing UAV-based datasets. These limitations present obstacles to creating technologies that can reliably contribute to SDG targets:
- High Scene Redundancy: Many datasets contain consecutive frames from similar scenes, offering minimal new information and limiting the generalization capabilities of trained models.
- Low Data Volume: The labor-intensive nature of annotation has resulted in smaller datasets, which are insufficient for training complex deep learning models required for robust performance.
- Limited Robustness: A reliance on single-modality (RGB) data restricts operational effectiveness to daytime and ideal weather, failing to meet the all-weather, all-time requirements for sustainable and resilient city management (SDG 11).
1.4 The Kust4K Dataset: A Resource for Sustainable Transportation
To address these gaps, the Kust4K dataset was developed. It is a large-scale, multimodal UAV semantic segmentation dataset specifically designed to advance robust urban traffic scene understanding. By providing diverse, high-information-density data and including scenarios with simulated sensor failure, Kust4K serves as a critical resource for developing and validating technologies that directly support the objectives of SDG 9 and SDG 11.
2.0 Methodology: A Framework for High-Quality Data Generation
The construction of the Kust4K dataset involved a systematic four-stage process designed to ensure data quality, relevance, and utility for developing technologies aligned with sustainable urban development.
- Data Acquisition: A hexacopter drone equipped with dual-spectrum RGB and TIR cameras was utilized. To maximize information density and support the development of generalizable models, synchronized frames were sampled every 15 frames from recorded videos, yielding 4,024 unique paired images.
- Data Registration: To ensure accurate spatial alignment between RGB and TIR images—a critical step for effective multimodal fusion—a combination of the Scale-Invariant Feature Transform (SIFT) algorithm and subsequent manual registration was employed. This rigorous process guarantees the high-fidelity data necessary for precise, pixel-level analysis of urban environments.
- Data Annotation: A semi-automated annotation workflow was implemented using the Labelme tool integrated with the Segment Anything Model (SAM). Expert annotators refined initial masks by leveraging complementary information from both RGB and TIR modalities. This process focused on eight categories essential for urban traffic analysis and sustainable city planning: Road, Building, Motorcycle, Car, Truck, Tree, Person, and Traffic Facilities. A cross-verification protocol was enforced to ensure the highest quality annotations.
- Dataset Generation: Final annotations were converted from JSON to PNG format mask images, with distinct pixel values assigned to each of the eight categories. This standardized format facilitates direct use in training and evaluating semantic segmentation models.
3.0 Dataset Profile and Technical Records
3.1 Dataset Attributes and Contribution to Resilient Systems (SDG 9)
The Kust4K dataset is structured to drive the development of robust and resilient intelligent systems. It includes 4,024 RGB-TIR image pairs, with a significant portion captured under challenging low-light conditions to ensure models can perform reliably around the clock. Furthermore, to enhance model adaptability and contribute to the creation of resilient infrastructure (SDG 9), a subset of the data includes simulated modality failures:
- RGB Failure Samples: Pixel values are set to zero to simulate complete darkness or sensor malfunction.
- TIR Failure Samples: Gaussian blur is applied to simulate focus issues or vibrations during flight.
This feature enables the development of models that are not overly reliant on a single data source, a key attribute for safety-critical systems in sustainable cities.
3.2 Dataset Splits for Standardized Evaluation
The dataset is divided into training, testing, and validation subsets in a 7:2:1 ratio. The distribution of daytime and nighttime samples is balanced across these splits to prevent model bias and ensure that performance evaluations are fair and representative of real-world operational conditions. This structured approach supports transparent and reproducible research, fostering a collaborative innovation ecosystem.
4.0 Technical Validation: Performance for Sustainable Urban Management
4.1 Evaluation of State-of-the-Art Models
The Kust4K dataset was used to evaluate eight state-of-the-art semantic segmentation models. The primary performance metric was the mean Intersection over Union (mIoU). The results demonstrate the dataset’s effectiveness and highlight key insights for developing technologies that support sustainable urban mobility.
4.2 Key Findings and Implications for SDGs
-
Superiority of Multimodal Fusion
Models trained with multimodal (RGB+TIR) input consistently and significantly outperformed those trained on a single modality. For instance, the CNN-based RTFNet showed a 7.3% mIoU improvement over a comparable unimodal model. This finding validates that integrating diverse data sources is crucial for achieving the high level of accuracy required for intelligent transportation systems, which are foundational to safer and more efficient cities (SDG 11).
-
Enhanced Segmentation of Small Objects
Attention-based and Mamba-based models demonstrated superior performance in segmenting small but critical objects like Motorcycles and Traffic Facilities. This capability is vital for improving road safety, a key target of SDG 3 (Good Health and Well-being) and SDG 11. The ability to accurately identify vulnerable road users and infrastructure elements enables more effective traffic control and accident prevention systems.
-
Robustness in Modality Failure Scenarios
Experiments on samples with simulated sensor failures revealed that multimodal models maintain relatively stable performance by leveraging information from the functioning modality. This demonstrates a pathway to building resilient systems (SDG 9) that can withstand partial hardware failures, ensuring continuous and reliable operation for critical urban services.
5.0 Conclusion: A Catalyst for Sustainable Innovation
The Kust4K dataset is a significant contribution to the field of computer vision and intelligent transportation. By providing a large-scale, high-quality, and robust multimodal dataset, it directly addresses the technical barriers hindering the development of advanced urban monitoring systems. The validation experiments confirm that multimodal approaches are essential for building the reliable and resilient technologies needed to manage complex urban environments.
Ultimately, Kust4K serves as a valuable resource for the global research community, enabling the innovation required to build the smart, safe, and sustainable cities envisioned by the Sustainable Development Goals. It provides a foundational tool for developing next-generation AI solutions that can enhance traffic efficiency, improve public safety, and reduce the environmental impact of urban transportation, thereby making a direct contribution to achieving SDG 9 and SDG 11.
Analysis of Sustainable Development Goals (SDGs) in the Article
1. Which SDGs are addressed or connected to the issues highlighted in the article?
The article’s focus on developing advanced technology for urban traffic management and intelligent transportation systems connects it to several Sustainable Development Goals (SDGs). The primary SDGs addressed are:
- SDG 9: Industry, Innovation, and Infrastructure: The article is fundamentally about technological innovation. It introduces a new dataset (Kust4K) and evaluates state-of-the-art models to advance “intelligent transportation research.” This directly supports building resilient infrastructure and fostering innovation.
- SDG 11: Sustainable Cities and Communities: The research aims to solve “increasing complexity of traffic issues” in urban areas due to “accelerating pace of urbanization.” By creating tools for “robust urban traffic scene understanding,” the work contributes to making transportation systems in cities more efficient, reliable, and sustainable.
- SDG 3: Good Health and Well-being: While not a primary focus, improving traffic monitoring systems has direct implications for road safety. The ability to accurately identify “vehicles and pedestrians” is a critical first step in developing systems that can reduce traffic accidents and related injuries or fatalities.
2. What specific targets under those SDGs can be identified based on the article’s content?
Based on the article’s content, the following specific SDG targets can be identified:
-
SDG 9: Industry, Innovation, and Infrastructure
- Target 9.1: Develop quality, reliable, sustainable and resilient infrastructure. The article addresses this by creating a dataset and testing models designed for “robust urban traffic scene understanding under challenging conditions.” The focus on handling “low-illumination environments” and simulated “modality failure” directly contributes to developing more reliable and resilient intelligent transportation infrastructure.
- Target 9.5: Enhance scientific research, upgrade the technological capabilities. The entire article is a manifestation of this target. It introduces a new, publicly available dataset (“Kust4K dataset is publicly available at figshare”) and conducts “extensive experiments with state-of-the-art models” to “advance robust urban traffic scene understanding,” thereby contributing directly to scientific research and technological advancement in this field.
-
SDG 11: Sustainable Cities and Communities
- Target 11.2: Provide access to safe, affordable, accessible and sustainable transport systems for all, improving road safety. The research aims to overcome the limitations of “traditional urban traffic management methods” to meet the demands for “efficient, comprehensive, and reliable monitoring.” This work serves as a foundational technology for developing “intelligent transportation systems” that can improve traffic flow and safety, making urban transport more sustainable.
-
SDG 3: Good Health and Well-being
- Target 3.6: Halve the number of global deaths and injuries from road traffic accidents. The technology’s ability to perform “pixel-level classification” to identify categories such as “Car,” “Truck,” “Motorcycle,” and “Person” is crucial for road safety applications. Accurate detection of vulnerable road users (pedestrians, motorcyclists) is a prerequisite for systems that can prevent accidents, thus contributing to this target.
3. Are there any indicators mentioned or implied in the article that can be used to measure progress towards the identified targets?
The article, being a technical paper, does not mention official SDG indicators. However, it contains several metrics and outputs that can be interpreted as proxy indicators for measuring technological progress towards the identified targets:
- For Target 9.1 (Resilient Infrastructure): The article explicitly tests for robustness against sensor failure. An implied indicator is the model’s performance (mIoU) under modality failure scenarios. As shown in Table 5, the study measures the mIoU when the RGB or TIR modality fails, providing a quantifiable measure of the system’s resilience. A smaller drop in performance indicates higher resilience.
- For Target 9.5 (Enhance Research): The primary output of the research is the Kust4K dataset itself. Its specifications—”4,024 of 640 × 512 pixel-aligned RGB-Thermal Infrared image pairs”—serve as an indicator of the new resources made available to the research community to foster innovation.
- For Target 11.2 (Sustainable Transport Systems): The core technical metric used in the paper, mean Intersection over Union (mIoU), serves as a direct indicator of the system’s capability. As stated in the “Performance evaluation metrics” section, mIoU assesses the performance of semantic segmentation models. A higher mIoU score, as detailed in Table 3, indicates a more accurate and reliable understanding of the traffic scene, which is essential for any effective intelligent transport system.
- For Target 3.6 (Road Safety): The article provides a detailed performance breakdown by category. Therefore, the segmentation accuracy (IoU) for specific vulnerable road user categories like “Person” and “Motorcycle” (as shown in Table 3) can be used as an indicator. Higher accuracy in identifying these categories is a direct measure of the technology’s potential to contribute to road safety systems.
4. Summary Table of SDGs, Targets, and Indicators
| SDGs | Targets | Indicators (Mentioned or Implied in the Article) |
|---|---|---|
| SDG 9: Industry, Innovation, and Infrastructure |
9.1: Develop quality, reliable, sustainable and resilient infrastructure.
9.5: Enhance scientific research and upgrade technological capabilities. |
– Performance of segmentation models under simulated modality failure (measured in mIoU) to quantify system resilience.
– The creation and public release of the Kust4K dataset (4,024 RGB-TIR image pairs) as a new resource for research. |
| SDG 11: Sustainable Cities and Communities | 11.2: Provide access to safe and sustainable transport systems for all. | – The overall performance of the semantic segmentation technology, measured by mean Intersection over Union (mIoU), as a proxy for the effectiveness of intelligent transportation systems. |
| SDG 3: Good Health and Well-being | 3.6: Halve global deaths and injuries from road traffic accidents. | – Segmentation accuracy (IoU) for specific vulnerable road user categories, particularly “Person” and “Motorcycle,” as a measure of the technology’s potential for safety applications. |
Source: nature.com
What is Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Angry
0
Sad
0
Wow
0
