Adversarial susceptibility analysis for water quality prediction models – Nature

Oct 29, 2025 - 16:30
 0  1
Adversarial susceptibility analysis for water quality prediction models – Nature

 

Report on Water Quality Prediction and its Implications for Sustainable Development Goals

Executive Summary

This report details a study on water quality in the Gujarat region of India, utilizing machine learning (ML) to predict contamination and pathogen presence. The research directly addresses key United Nations Sustainable Development Goals (SDGs), primarily SDG 6 (Clean Water and Sanitation) and SDG 3 (Good Health and Well-being). By employing advanced analytical models, the study aims to create a proactive monitoring system to prevent waterborne disease outbreaks. Various classifiers were tested, with Random Forest and Bagging models achieving the highest accuracy at 98.53%. To ensure transparency and actionable insights for policymakers, Explainable AI (XAI) techniques, specifically SHAP, were used to identify the most significant factors contributing to contamination. Critically, the study also evaluates the robustness of these models against adversarial attacks, simulating real-world data corruption. The results show a significant performance drop (up to 56%) under attack, underscoring the necessity of building resilient AI systems as part of innovative and sustainable infrastructure (SDG 9) to protect public health. This highlights a crucial gap between model accuracy and real-world reliability, which is vital for achieving sustainable development targets.

Introduction: Water Quality as a Cornerstone for Sustainable Development

The Challenge to SDG 3 and SDG 6 in Gujarat

Access to clean water is a fundamental requirement for achieving global sustainability, forming the bedrock of SDG 6 (Clean Water and Sanitation) and directly impacting SDG 3 (Good Health and Well-being). However, rapid urbanization and industrialization, particularly in regions like Gujarat, have led to severe water contamination, posing a significant threat to these goals. Inadequate waste management and high dependence on groundwater have compromised water safety, leading to a rise in waterborne diseases. The World Health Organization (WHO) estimates that diarrheal diseases, largely preventable with safe water, cause 7.3 million deaths annually. This public health crisis underscores the urgent need for effective water quality management. Traditional monitoring methods often fail to detect rare but dangerous pathogens, necessitating advanced technological solutions to create early warning systems and safeguard communities, thereby contributing to the development of sustainable cities (SDG 11).

Methodology: Leveraging Innovation for SDG Monitoring (SDG 9)

Data Collection and Preprocessing

To build a predictive model aligned with SDG targets, this study utilized a comprehensive dataset from the Central Pollution Control Board of India, covering a five-year period (2017-2022) across 22 states. The dataset includes critical water quality parameters such as pH, BOD, Dissolved solids, and Fecal Coliform. The data was meticulously preprocessed to ensure model accuracy and reliability. Missing values were handled using mean and median imputation, a crucial step in preparing robust data for analysis that can support evidence-based policymaking for sustainable water management.

Application of Machine Learning Classifiers

In pursuit of SDG 9 (Industry, Innovation, and Infrastructure), this study implemented and compared several advanced machine learning models to classify water quality and predict susceptibility to waterborne diseases. The classifiers employed include:

  • HistGradientBoosting
  • Random Forest
  • AdaBoost
  • Bagging
  • Decision Tree
  • Long Short-Term Memory (LSTM)

These models were trained to identify patterns of contamination, providing a powerful tool for authorities to monitor water quality and take preventive action, thus operationalizing the goals of SDG 6.

Ensuring Transparency and Robustness

For AI-driven systems to be effective in public policy, they must be both transparent and robust. This study addressed this by:

  1. Integrating Explainable AI (XAI): Using SHapley Additive exPlanations (SHAP), the “black-box” nature of the ML models was interpreted. This provides clear insights into which water quality parameters are the most influential predictors of contamination, enabling targeted interventions.
  2. Conducting Adversarial Training: The models were tested against adversarial attacks, namely the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD). This simulated sensor noise or data tampering to assess the resilience of the AI system, a critical requirement for any technology deployed within public health infrastructure.

Results and Analysis: Progress and Vulnerabilities in SDG Attainment

Model Performance in Predicting Water Contamination

The evaluation of various machine learning models demonstrated significant potential for AI-driven water quality monitoring. The Random Forest and Bagging classifiers emerged as the most effective, both achieving a peak accuracy of 98.53%. The HistGradientBoosting classifier also showed strong performance with an accuracy of up to 98.17%. In contrast, the LSTM deep learning model achieved an accuracy of 91.9%. This high level of accuracy indicates that ML models can serve as a reliable and efficient tool for identifying contaminated water sources, providing a scalable solution to support the monitoring requirements of SDG 6.

Insights from Explainable AI (XAI) for Targeted Interventions

The application of SHAP provided crucial insights into the key drivers of water contamination. The analysis revealed that parameters such as temperature and total coliform levels have a high impact on the model’s predictions. By identifying the most significant features, XAI makes the model’s decisions transparent and provides actionable intelligence. Health officials and environmental agencies can use this information to prioritize resources and develop targeted strategies to mitigate the most critical sources of pollution, leading to more efficient and effective water management in line with SDG 6 principles.

Assessing Model Resilience for Public Health Security (SDG 3 & SDG 9)

While the models demonstrated high accuracy on clean data, the adversarial attack simulations revealed critical vulnerabilities. The performance of the top-performing models dropped significantly when subjected to perturbations, highlighting a major risk for real-world deployment.

  • The Random Forest model, despite its 98% clean accuracy, saw its performance plummet to 40.95% under a PGD attack.
  • The Simple Neural Network model, with a lower clean accuracy of 74.77%, proved more resilient. While its accuracy dropped under FGSM and PGD attacks, it stabilized and did not collapse entirely, unlike the machine learning model.
  • Overall, a performance drop of approximately 56% was observed under FGSM and PGD attacks before adversarial training.

This finding is critical for the SDGs. An unreliable monitoring system can misclassify contaminated water as safe, leading to disease outbreaks and undermining progress on SDG 3. It proves that achieving high accuracy is insufficient; building robust and resilient AI systems is essential for the sustainable infrastructure envisioned in SDG 9.

Conclusion and Policy Implications for Sustainable Development

Key Findings and Contributions to the SDGs

This study provides a dual contribution to the sustainable development agenda. First, it successfully demonstrates the high potential of machine learning and XAI to create powerful, transparent, and scalable systems for monitoring water quality. Second, and more critically, it exposes the profound vulnerability of these systems to data perturbations, which poses a direct threat to public health and safety.

  1. Advanced Monitoring for SDG 6: The high accuracy of models like Random Forest offers a pathway to enhance and automate the monitoring of clean water resources.
  2. Actionable Insights for Health Policy: XAI techniques provide the transparency needed for policymakers to trust and act upon AI-driven recommendations for pollution control.
  3. A Call for Resilient Infrastructure (SDG 9): The dramatic failure of accurate models under adversarial conditions highlights the urgent need to prioritize robustness and security in the design of AI for critical applications, ensuring they reliably support SDG 3.

Recommendations for Action

To translate these findings into progress toward the SDGs, the following actions are recommended:

  • Adopt Robust AI Frameworks: Public health and environmental agencies should mandate that any AI system used for critical monitoring, such as water quality, must undergo rigorous adversarial testing and incorporate robustness-enhancing techniques like adversarial training.
  • Leverage XAI for Policy: Integrate insights from XAI tools to develop data-driven, targeted interventions that address the most significant pollutants and contamination sources, optimizing resource allocation for achieving SDG 6.
  • Foster Partnerships for Innovation (SDG 17): Encourage collaboration between research institutions, government bodies, and technology developers to build secure, resilient, and transparent AI solutions for public health and environmental management.

Ultimately, this research serves as a critical reminder that for technology to advance sustainable development, it must be not only innovative but also resilient, trustworthy, and securely integrated into our public infrastructure.

1. Which SDGs are addressed or connected to the issues highlighted in the article?

  • SDG 3: Good Health and Well-being

    The article directly connects water quality to public health, highlighting the prevalence of waterborne diseases. It mentions that “Water quality is a critical factor for human health” and that contamination leads to an “increasing the prevalence of waterborne diseases.” The text cites WHO estimates of “7.3 million deaths occur annually due to diarrheal diseases” and discusses a specific outbreak of Guillain-Barré syndrome linked to the pathogen Campylobacter jejuni in contaminated water. The study’s core motivation is to use AI for “proactive water quality monitoring and pathogen detection to prevent disease outbreaks.”

  • SDG 6: Clean Water and Sanitation

    This is the central SDG addressed. The article’s primary focus is on water quality assessment, contamination, and the challenges of ensuring safe drinking water. It discusses how “Rapid urbanization and industrialization have led to significant water contamination” and mentions specific challenges in Gujarat, including “inadequate waste management.” The study analyzes water parameters like “pH, BOD, Dissolved solids, … Fecal Coliform, and Fecal Streptococci” to assess pollution levels, directly aligning with the goal of improving water quality and ensuring access to clean water.

  • SDG 9: Industry, Innovation, and Infrastructure

    The article showcases the use of advanced technology and innovation to address a public health and environmental challenge. The entire study is based on employing “machine learning models to analyze contamination patterns” and using “Explainable AI techniques” to provide actionable insights. It also explores the need for “resilient AI systems in public health” by testing model robustness against adversarial attacks, which relates to building reliable and resilient infrastructure for critical services like water quality monitoring.

  • SDG 11: Sustainable Cities and Communities

    The article links the problem of water contamination directly to urban development. It states that “Rapid urbanization and industrialization have led to significant water contamination.” The pilot study in Gujarat highlights issues prevalent in urban and peri-urban areas, such as the “disposal of wastewater” in “open and closed gutters” and sewage lines that “remain open.” This points to the challenge of managing the environmental impact of cities, particularly regarding waste and water management, which is a key aspect of SDG 11.

2. What specific targets under those SDGs can be identified based on the article’s content?

  • SDG 3: Good Health and Well-being

    • Target 3.3: By 2030, end the epidemics of AIDS, tuberculosis, malaria and neglected tropical diseases and combat hepatitis, water-borne diseases and other communicable diseases.

      Explanation: The article is fundamentally about combating waterborne diseases. It explicitly mentions diarrheal diseases, Guillain-Barré syndrome linked to Campylobacter jejuni, and the overall goal of developing an “AI-driven decision support system to assist environmental agencies and policymakers in early disease outbreak detection.”
    • Target 3.9: By 2030, substantially reduce the number of deaths and illnesses from hazardous chemicals and air, water and soil pollution and contamination.

      Explanation: The study aims to identify and predict water contamination from pollutants and pathogens, which are the direct causes of illness and death. The introduction states that compromised water quality “poses a significant health risk,” and the entire research is geared towards mitigating this risk through better monitoring.
  • SDG 6: Clean Water and Sanitation

    • Target 6.1: By 2030, achieve universal and equitable access to safe and affordable drinking water for all.

      Explanation: The article highlights the challenges in ensuring “safe drinking water” in regions like Gujarat. The development of a model to predict water potability and detect pathogens is a direct contribution towards ensuring the water supplied to communities is safe for consumption.
    • Target 6.3: By 2030, improve water quality by reducing pollution, eliminating dumping and minimizing release of hazardous chemicals and materials, halving the proportion of untreated wastewater and substantially increasing recycling and safe reuse globally.

      Explanation: The article discusses water contamination from “domestic, industrial, and agricultural activities” and inadequate “waste management.” The pilot study observes the “disposal of wastewater” into open gutters. By developing models that can identify key pollutants (“total coliform is too high crossing the permissible limits”), the research provides tools to better manage and reduce water pollution from these sources.
  • SDG 9: Industry, Innovation, and Infrastructure

    • Target 9.1: Develop quality, reliable, sustainable and resilient infrastructure, including regional and transborder infrastructure, to support economic development and human well-being, with a focus on affordable and equitable access for all.

      Explanation: The article’s focus on the robustness of AI models (“adversarial training,” “resilient AI systems”) for water quality monitoring contributes to developing reliable technological infrastructure for public health. It notes that “Government or NGO decisions based on wrong predictions can have severe consequences,” emphasizing the need for resilient systems.
    • Target 9.5: Enhance scientific research, upgrade the technological capabilities of industrial sectors in all countries, in particular developing countries, including, by 2030, encouraging innovation and substantially increasing the number of research and development workers per 1 million people and public and private research and development spending.

      Explanation: The study is a piece of scientific research that applies cutting-edge technologies like machine learning, deep learning (LSTM), and Explainable AI (SHAP) to a critical development problem in India. This directly aligns with enhancing scientific research and upgrading technological capabilities for environmental and health management.
  • SDG 11: Sustainable Cities and Communities

    • Target 11.6: By 2030, reduce the adverse per capita environmental impact of cities, including by paying special attention to air quality and municipal and other waste management.

      Explanation: The article identifies “inadequate waste management” and the “disposal of wastewater” from industries and households in open gutters as a primary cause of water contamination in the Gujarat region. This directly relates to the challenge of managing municipal and industrial waste to reduce the environmental impact of cities.

3. Are there any indicators mentioned or implied in the article that can be used to measure progress towards the identified targets?

  • Indicators for Health (SDG 3)

    • Mortality rate attributed to unsafe water (Implied): The article cites the WHO statistic that “7.3 million deaths occur annually due to diarrheal diseases,” which serves as a global indicator of the scale of the problem. A reduction in such deaths would measure progress.
    • Incidence of waterborne diseases (Mentioned): The epidemiological survey conducted in Gujarat, which identified “signs and symptoms observed from 250 people” and plotted diarrheal diseases, is a direct measurement of disease incidence in a specific population. This can be used as an indicator (e.g., Indicator 3.3.1).
  • Indicators for Water Quality (SDG 6)

    • Proportion of bodies of water with good ambient water quality (Mentioned): The study uses a dataset with specific water quality parameters: “pH, BOD, Dissolved solids, Temperature, Conductivity, Nitrogen, Fecal Coliform, and Fecal Streptococci.” These are standard metrics used to assess ambient water quality (related to Indicator 6.3.2). The SHAP analysis identifies “Total coliform” as a dominant feature, which is a key indicator of fecal contamination.
    • Proportion of population using safely managed drinking water services (Implied): The study’s goal of creating a “Water Potability Prediction Model” directly addresses the “safety” aspect of drinking water. The model’s output (classifying water as safe or unsafe) serves as a proxy indicator for the quality of water being supplied.
  • Indicators for Innovation and Infrastructure (SDG 9)

    • Accuracy and robustness of predictive models (Mentioned): The article provides specific metrics for the performance of its AI models, such as “highest accuracy at 98.53%” for Random Forest and Bagging classifiers. It also measures the “performance drop of up to approx. 56%” under adversarial attacks. These metrics serve as indicators of the reliability and resilience of the technological infrastructure being developed.
  • Indicators for Sustainable Cities (SDG 11)

    • Proportion of municipal solid waste collected and managed in controlled facilities (Implied): The article’s observation that “the count of disposing the waste in open gutter is more” and that “sewage lines in most of the areas remain open” is a qualitative indicator of poor waste management. This implies a low proportion of waste being managed in controlled facilities (related to Indicator 11.6.1).

4. Table of SDGs, Targets, and Indicators

SDGs Targets Indicators Identified in the Article
SDG 3: Good Health and Well-being 3.3: End epidemics of… water-borne diseases.

3.9: Reduce deaths and illnesses from… water… pollution and contamination.

  • Number of deaths from diarrheal diseases (cites WHO data).
  • Incidence of waterborne diseases (measured in the Gujarat survey).
  • Presence of specific pathogens (e.g., Campylobacter jejuni, Fecal Coliform).
SDG 6: Clean Water and Sanitation 6.1: Achieve access to safe and affordable drinking water.

6.3: Improve water quality by reducing pollution… and halving the proportion of untreated wastewater.

  • Water quality parameters (pH, BOD, Dissolved solids, Fecal Coliform, etc.).
  • Concentration of Total Coliform (identified as a key feature).
  • Model prediction of water potability (safe/unsafe classification).
SDG 9: Industry, Innovation, and Infrastructure 9.1: Develop quality, reliable, sustainable and resilient infrastructure.

9.5: Enhance scientific research, upgrade technological capabilities.

  • Accuracy of machine learning models (e.g., 98.53%).
  • Robustness of AI models under adversarial attacks (performance drop measured in %).
  • Application of advanced technologies (AI, XAI, LSTM).
SDG 11: Sustainable Cities and Communities 11.6: Reduce the adverse per capita environmental impact of cities, including… municipal and other waste management.
  • Prevalence of improper wastewater disposal (observation of waste in “open gutter”).
  • Status of sanitation infrastructure (observation of “open” sewage lines).

Source: nature.com

 

What is Your Reaction?

Like Like 0
Dislike Dislike 0
Love Love 0
Funny Funny 0
Angry Angry 0
Sad Sad 0
Wow Wow 0
sdgtalks I was built to make this world a better place :)