Data driven crop disease modeling

Abstract: The concept of precision farming deals with the creation and use of data from machinery and sensors on and off the field to optimize resources and sustainably intensify food production to keep up with increasing demand. However, in the face of a growing amount of data being collected, smarter data processing and analysis techniques are needed and have prompted the evaluation and incorporation of artificial intelligence (AI) and machine learning (ML) techniques for multiple use cases right from seeding to harvesting. One such use case that has yet to fully gauge the propositions of AI and ML is crop disease prediction. Since multiple biotic and abiotic factors could be responsible for the occurrence of a disease, modeling requires finding suitable data associated with these factors from multiple farms for an extended time frame and developing smarter models able to capture underlying relationships between them. This thesis presents research conducted to develop data-driven methodologies and optimization approaches for building crop disease models. The objective is realized by breaking down the task into three modules: (i) data collection; (ii) data processing and model building; and finally, (iii) the maintenance of models in production. The traditional data collection approach for disease modeling is through setting up of trials which is expensive and labor-intensive which prompted the evaluation of other novel and free to access data sources. Therefore, in module one two studies were conducted to assess the suitability of social media platforms and remote sensing products. The results show that social media is not a viable option yet due to limited geo-referenced data and ambiguity in categorizing the discussions. On the other hand, vegetation indices derived from multispectral satellite imagery, despite their high spatial granularity, are an interesting addition to the modeling pipeline. Moving on to module two, a study was conducted to demonstrate the process of fusing and preparing data from multiple sources with different formats collected in an extended time frame to be used for model building. The study establishes the relevance of using advanced machine learning models such as deep learning in the prediction of crop diseases. The results show that given the appropriate data preparation process at the right data granularity and the use of some smart tricks, neural network-based models hold the potential to outperform widely used models such as XGBoost. Since neural networks offer advantages such as multimodal learning, transfer learning, and automated feature engineering, which are crucial in building scalable models with heterogeneous data and reduced human effort, the observations of this study led to a follow-up study. This study investigates neural network-based algorithms specifically designed for tabular data and compares them against popular tree ensemble-based models. Apart from acting as a comprehensive analysis of the two families of techniques the results showed that although neural network-based models were not able to outperform tree-based models, they achieved comparable results and allowed for the creation of easier and more accurate models for new diseases by application of transfer learning. Climate change leads to unexpected weather events and modified disease occurrence patterns that cause static models to drift rapidly. Models need to be maintained to ensure they are performing as required. Capturing real-time data and triggering retraining when enough new data has been collected can help maintain models by acting as a feedback loop for model improvement. This was attempted by collecting crowd-sourced data from a disease recognition app, but it was not usable in its current form and required further annotation. Since annotations are expensive and time-consuming, a study for real-life agricultural data retrieval and large-scale annotation flow optimization based on similarity search technique is presented which significantly optimizes the annotation process. The results derived from these studies are highly relevant for progressing the United Nations Sustainable Development Goal of Zero Hunger. It is also expected to ease farmers' anxiety related to yield loss due to crop diseases and enhance their capability of planning and scheduling management practices by giving them an early warning of disease occurrence. The results have been verified through comparison with traditional crop disease prediction methods and interaction with experienced agronomists working for a major AgTech company.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.