Predicting wheat yield using deep learning and multi-source environmental data – Nature

Predicting wheat yield using deep learning and multi-source environmental data – Nature

 

Report on Deep Learning for Wheat Yield Prediction and its Contribution to Sustainable Development Goals

Executive Summary

This report details the development and evaluation of DeepAgroNet, a novel deep learning framework designed to forecast winter wheat yields in southern Pakistan. By integrating satellite, meteorological, and soil data, the framework directly addresses critical Sustainable Development Goals (SDGs), particularly SDG 2 (Zero Hunger) by enhancing food security through accurate pre-harvest predictions. The study employed three deep learning models: Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Artificial Neural Networks (ANN). The CNN model demonstrated superior performance, achieving 98% accuracy one month before harvest. These findings underscore the potential of advanced technologies to support SDG 9 (Industry, Innovation, and Infrastructure) and SDG 12 (Responsible Consumption and Production) by promoting precision agriculture and optimizing resource management. The framework’s success serves as a scalable model for improving agricultural sustainability and addressing climate-related challenges (SDG 13: Climate Action) in other regions globally.

1. Introduction: Aligning Agricultural Forecasting with Sustainable Development Goals

The accurate and timely prediction of crop yields is fundamental to national and global food security strategies. Wheat, a staple food in Pakistan, presents a significant forecasting challenge due to complex environmental interactions. This study introduces a deep learning approach to address this, aligning technological innovation with key sustainable development objectives.

1.1. The Imperative of Food Security (SDG 2: Zero Hunger)

Ensuring a stable food supply is paramount for achieving Zero Hunger. Traditional methods of yield estimation in Pakistan are often manual, slow, and lack the precision needed for effective planning. This research aims to overcome these limitations by providing reliable, data-driven yield forecasts, enabling policymakers to make informed decisions regarding food storage, distribution, and market stability, thereby contributing directly to SDG 2.

1.2. Leveraging Innovation for Sustainable Agriculture (SDG 9 & SDG 12)

Modernizing agricultural practices through technological innovation is a core component of SDG 9. This study leverages advanced deep learning models to create a robust forecasting system. By improving prediction accuracy, this framework supports precision agriculture, which in turn promotes more efficient use of resources such as water and fertilizer. This optimization aligns with the principles of SDG 12 (Responsible Consumption and Production) by minimizing agricultural inputs and environmental impact.

The primary objectives of this research were to:

  1. Explore the viability of deep learning for forecasting wheat yields using historical, multi-source data.
  2. Demonstrate the importance of integrating climate, soil, and remote sensing data for accurate estimation.
  3. Develop a scalable model to replace conventional manual estimation processes in Pakistan.
  4. Provide accurate yield predictions before the harvest season to support strategic agricultural planning.

2. Methodology: An Integrated Framework for Sustainable Yield Prediction

The study was conducted in the Multan district of Punjab, a primary wheat-growing region in Pakistan. A multi-source data approach was adopted to build a comprehensive dataset for training the deep learning models.

2.1. Data Acquisition and Sources

A diverse range of data from 2017 to 2022 was collected and integrated to capture the various factors influencing crop yield. This multi-stakeholder data approach supports SDG 17 (Partnerships for the Goals).

  • Satellite Data: Landsat 8 satellite imagery was sourced from the USGS to calculate the Normalized Difference Vegetation Index (NDVI), a key indicator of crop health.
  • Meteorological Data: Climatic data, including temperature and precipitation, were obtained from NASA’s POWER Data Access Viewer and the Pakistan Meteorological Department. This is crucial for modeling the impacts of climate variability, a key concern of SDG 13 (Climate Action).
  • Soil Data: Soil characteristics such as type, pH, and fertility were collected from the Punjab Soil Fertility Authority to model their impact on crop productivity.
  • Yield Data: Official statistics on wheat area, production per acre, and total tonnage were obtained from the Crop Reporting Services (CRS) of the Government of Punjab, serving as the benchmark for model validation.

2.2. Data Integration and Preprocessing

The Google Earth Engine (GEE) platform was utilized for processing and integrating remote sensing data. All datasets were spatially aggregated at the district level and normalized. To isolate the impact of annual weather variations from long-term technological and management improvements, the historical yield data was detrended. This step is crucial for building a model that accurately reflects the influence of environmental factors rather than consistent year-over-year growth trends.

2.3. Deep Learning Model Development

A three-branch deep learning framework, DeepAgroNet, was developed using the TensorFlow toolkit. Three distinct models were trained and evaluated to handle different aspects of the data:

  • Convolutional Neural Network (CNN): Chosen for its strength in extracting spatial features from satellite imagery and patterns from climatic data arrays.
  • Recurrent Neural Network (RNN): Employed for its ability to model temporal dependencies and sequences in historical yield and weather data.
  • Artificial Neural Network (ANN): Used for its versatility in handling structured, tabular data combining various static features like soil properties.

The models were evaluated using a leave-one-year-out cross-validation method, with performance measured by the Root Mean Square Error (RMSE) and the Coefficient of Determination (R²).

3. Results: Performance Analysis of Deep Learning Models

The three deep learning models demonstrated strong predictive capabilities, with all models achieving yield error rates below 10%. The results confirm the hypothesis that deep learning can significantly improve the accuracy of crop yield predictions over traditional methods.

3.1. Convolutional Neural Network (CNN) Performance

The CNN model emerged as the most effective forecasting tool. It achieved the highest accuracy, with an R² value of 0.77 and a forecast accuracy of 98% when predicting yields one month prior to harvest. Its ability to process spatial data from satellite images allowed it to effectively capture crop health variations across the district, leading to superior performance.

3.2. Recurrent Neural Network (RNN) and Artificial Neural Network (ANN) Performance

The RNN and ANN models also showed robust predictive power, though slightly less than the CNN.

  • The RNN model achieved an R² value of 0.72, demonstrating its competence in capturing temporal trends within the data.
  • The ANN model recorded an R² value of 0.66, proving effective at integrating diverse, static input variables.

3.3. Comparative Model Evaluation

A comparative analysis confirmed the superiority of the CNN model for this specific application. When benchmarked against official CRS data, the CNN’s predictions were consistently closer to the observed yields. The overall performance highlights the framework’s reliability and potential for scalability, providing a robust tool for agricultural management.

4. Discussion: Implications for Climate Action and Responsible Production

The success of the DeepAgroNet framework carries significant implications for advancing sustainable agriculture in Pakistan and beyond. The results not only validate the technical approach but also highlight pathways for addressing global challenges.

4.1. Model Efficacy and Contribution to Precision Agriculture

The high accuracy of the CNN model is attributed to its ability to interpret complex spatial patterns in satellite and climate data, which are often missed by traditional statistical models. This capability is a cornerstone of precision agriculture, enabling targeted interventions and resource allocation. By providing farmers and policymakers with accurate, location-specific forecasts, the model supports smarter farming, contributing to SDG 9 and SDG 12.

4.2. Climate Change Impacts on Wheat Yield (SDG 13: Climate Action)

The study analyzed the impact of key climate variables, such as temperature and rainfall, on wheat yields. The findings confirmed that wheat production in the Multan region is highly sensitive to climatic conditions, with extreme temperatures and variable rainfall patterns posing significant risks. By effectively modeling these relationships, the framework serves as a tool for climate change adaptation, allowing stakeholders to anticipate and mitigate the effects of adverse weather events, a direct contribution to SDG 13.

4.3. Advancing Responsible Production Practices (SDG 12)

Accurate yield forecasting can drive more responsible production. For instance, predictions of lower-than-average yields could prompt investigations into soil health or irrigation practices, leading to more sustainable management. Furthermore, reliable national-level forecasts can prevent market speculation and reduce post-harvest losses by improving supply chain logistics. This enhances resource efficiency and promotes sustainable consumption and production patterns, aligning with the targets of SDG 12.

5. Conclusion and Recommendations

5.1. Summary of Findings

This study successfully demonstrated that a deep learning framework integrating multi-source environmental data can accurately predict winter wheat yields at the district level in Pakistan. The DeepAgroNet model, particularly its CNN branch, provides a reliable, scalable, and timely alternative to traditional manual forecasting methods. The research validates the use of advanced AI to enhance food security (SDG 2) and promote sustainable agricultural practices (SDG 9, 12, 13).

5.2. Recommendations for Policy and Practice

  1. Adoption by Government Agencies: Agricultural and food security departments in Pakistan should consider integrating this framework into their operational planning to improve the accuracy of national crop reports.
  2. Development of Farmer-Facing Tools: The model can be adapted to provide localized advisories to farmers, helping them make better decisions regarding irrigation, fertilization, and harvest timing.
  3. Investment in Data Infrastructure: Continued investment in accessible, high-quality agricultural and climate data is essential for refining and scaling up such predictive models.

5.3. Future Research Directions

While this study was confined to one district, the framework is designed for scalability. Future work should focus on expanding the model to cover other agricultural regions in Pakistan and other countries. Further research could also incorporate additional variables, such as the impact of floods, waterlogging, and soil salinity, to enhance model robustness and address a wider range of challenges relevant to sustainable agriculture.

Analysis of Sustainable Development Goals (SDGs) in the Article

1. Which SDGs are addressed or connected to the issues highlighted in the article?

  • SDG 2: Zero Hunger

    The article directly addresses SDG 2 by focusing on ensuring food security through accurate crop yield forecasting. It highlights that “Accurate forecasting of crop yields is essential for ensuring food security” and that wheat is a “primary staple food” in Pakistan, playing a “pivotal role in guaranteeing a stable food supply.” The study’s main objective is to improve wheat yield prediction, which is fundamental to ending hunger and managing food resources effectively.

  • SDG 9: Industry, Innovation, and Infrastructure

    This goal is central to the article’s methodology. The study introduces “DeepAgroNet, a novel three-branch deep learning framework” that integrates advanced technologies like “satellite imagery, meteorological data, and soil characteristics.” The use of Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Artificial Neural Networks (ANN) represents a significant scientific and technological innovation in agriculture. The article states that the framework is “adaptable” and can “serve as a model for similar applications in other agricultural regions,” promoting the enhancement of technological capabilities in developing countries.

  • SDG 13: Climate Action

    The article connects crop yield prediction to climate-related challenges. It acknowledges that “Climate change threatens the crop production systems of staple crops” and that yield is influenced by “climatic conditions.” By incorporating “meteorological data” and analyzing factors like temperature and precipitation, the proposed model aims to strengthen resilience and adaptive capacity to climate-related hazards. The ability to forecast yields accurately despite climate variability is a key aspect of climate action in agriculture.

  • SDG 15: Life on Land

    The study’s emphasis on using “soil characteristics” and “soil data” for yield prediction connects to the sustainable use of terrestrial ecosystems. The model considers factors like “soil quality,” “soil type, pH, [and] cation exchange capability.” Furthermore, it uses the Normalized Difference Vegetation Index (NDVI) derived from satellite data to monitor vegetation health. This data-driven approach supports “sustainable agricultural practices” that can lead to better management of land and soil, contributing to the restoration of land quality.

2. What specific targets under those SDGs can be identified based on the article’s content?

  1. SDG 2: Zero Hunger

    • Target 2.1: By 2030, end hunger and ensure access by all people… to safe, nutritious and sufficient food all year round. The article supports this by developing a tool for “ensuring food security” and a “stable food supply” through accurate yield prediction.
    • Target 2.3: By 2030, double the agricultural productivity and incomes of small-scale food producers. The study aims to improve “precision agriculture practices” and increase “agricultural productivity” by providing accurate yield forecasts, which can help farmers optimize inputs and management.
    • Target 2.4: By 2030, ensure sustainable food production systems and implement resilient agricultural practices that increase productivity and production… and that strengthen capacity for adaptation to climate change. The article’s framework is designed to create a resilient forecasting system by integrating climate and soil data, promoting “sustainable agricultural development.”
  2. SDG 9: Industry, Innovation, and Infrastructure

    • Target 9.5: Enhance scientific research, upgrade the technological capabilities of industrial sectors in all countries, in particular developing countries. The development and application of the “DeepAgroNet” framework is a direct example of enhancing scientific research and upgrading technology in the agricultural sector of Pakistan, a developing country.
    • Target 9.b: Support domestic technology development, research and innovation in developing countries. The study is a domestic research initiative in Pakistan aimed at creating an innovative solution (“minimize the local conventional manual process”) for a local problem, with the potential for global application.
  3. SDG 13: Climate Action

    • Target 13.1: Strengthen resilience and adaptive capacity to climate-related hazards and natural disasters in all countries. The model’s ability to provide a “forecast accuracy of 98% one month before harvest” serves as an early warning system, enhancing the agricultural sector’s resilience and capacity to adapt to climate variability and its impact on crop yields.
  4. SDG 15: Life on Land

    • Target 15.3: By 2030, combat desertification, restore degraded land and soil… and strive to achieve a land-degradation-neutral world. The model’s use of “soil data” and monitoring of vegetation health via NDVI contributes to better land management. Accurate predictions can prevent overuse of resources, thereby helping to maintain or improve soil quality over time.

3. Are there any indicators mentioned or implied in the article that can be used to measure progress towards the identified targets?

  1. For SDG Target 2.3 & 2.4 (Agricultural Productivity and Sustainability)

    • Indicator: Agricultural Productivity (Yield per acre/hectare). The article explicitly measures and predicts “wheat crop yield per acre” and “total wheat crop production in tons.” Tables 3, 5, 7, 9, 10, 11, and 13 provide specific values for these metrics, which are direct indicators of agricultural productivity.
    • Indicator: Accuracy of Yield Forecasts. The article reports a “forecast accuracy of 98% one month before harvest.” This metric serves as an indicator of the effectiveness of new technologies in supporting sustainable and resilient agriculture.
  2. For SDG Target 9.5 (Technological Innovation)

    • Indicator: Performance of the Technological Framework. The article uses several metrics to evaluate its innovative deep learning model, including the “R² value” (up to 0.77 for the best model), “Root Mean Square Error (RMSE),” and “Mean Absolute Error (MAE).” These quantitative measures serve as indicators of the success and advancement of the scientific research presented.
  3. For SDG Target 13.1 (Climate Resilience)

    • Indicator: Timeliness of Early Warning. The model’s ability to predict yield “one month before harvest” is a direct indicator of an improved early warning system. This lead time allows policymakers and farmers to take adaptive measures against potential climate-induced shortfalls.
  4. For SDG Target 15.3 (Land Health)

    • Indicator: Land Cover and Vegetation Health (NDVI). The article uses the “normalized difference vegetation index (NDVI)” as a key input. NDVI is a widely accepted proxy indicator for vegetation health and land cover, which can be used to monitor land degradation over time. The study’s reliance on this data implies its importance in measuring progress toward sustainable land management.

4. Summary Table of SDGs, Targets, and Indicators

SDGs Targets Indicators Identified in the Article
SDG 2: Zero Hunger 2.1: Ensure access to sufficient food.
2.3: Double agricultural productivity.
2.4: Ensure sustainable and resilient food production systems.
– Volume of production (Total production in Tons).
– Agricultural productivity (Yield per acre).
– Accuracy and timeliness of yield forecasts (98% accuracy one month before harvest).
SDG 9: Industry, Innovation, and Infrastructure 9.5: Enhance scientific research and upgrade technological capabilities.
9.b: Support domestic technology development and innovation.
– Development of a novel technological framework (DeepAgroNet).
– Performance metrics of the new technology (R², RMSE, MAE).
SDG 13: Climate Action 13.1: Strengthen resilience and adaptive capacity to climate-related hazards. – Timeliness of early warning systems for crop yield (prediction one month before harvest).
– Integration of meteorological data (temperature, precipitation) to model climate impact.
SDG 15: Life on Land 15.3: Combat desertification and restore degraded land and soil. – Use of soil data (soil type, pH) for sustainable management.
– Use of Normalized Difference Vegetation Index (NDVI) to monitor land/vegetation health.

Source: nature.com