Article Summary

AI-ML Accounting for Uncertain Water Resources Data Sets

Artificial intelligence (AI) is the effort to automate intellectual tasks normally performed by humans, and it includes machine learning (ML) and deep learning (DL) approaches. AI-ML is based on statistical learning, which tries to learn statistics-based rules for data analyses from known examples of inputs and corresponding outcomes. Data sets that are noisy, include significant uncertainty, and have extreme values hinder statistical learning. ML and DL aquifer recharge predictors are developed to: (1) examine prediction skill when trained using noisy and uncertain data and (2) identify advantages of AI-ML relative to traditional physics- and process-based calculations. Recharge was selected as the learning outcome because it is not observed and is inherently uncertain. A common-sense baseline is developed and implemented to account for uncertainty and noise in AI-ML predictions. The baseline provides a lower goodness-of-fit threshold that identifies when trained AI-ML generates prediction skill and an upper goodness-of-fit threshold above which the AI-ML is learning to reproduce noise and bias in the training data set (and is likely overfitting). Identified advantages for AI-ML (relative to physics- or process-based calculations) are the ability to use dimensionless trends for features and to represent a complex scenario with the same level of effort as for a simple case.

Links

Visual Explanation of Recharge Calculations for the Balcones Fault Zone (BFZ) Edwards Aquifer Recharge Zone

Statistical learning of water budget outcomes accounting for target and feature uncertainty

Abstract: Statistical learning seeks to learn statistics-based rules for data analysis tasks from known examples of inputs, or features, and corresponding outcomes and includes machine learning (ML) and deep learning (DL) algorithms. Data sets that are noisy, include significant uncertainty, and have extreme values hinder the learning process. In this study, aquifer recharge predictors are developed using four, random forest or gradient boosting ML methods and Long Short-Term Memory (LSTM) networks, a DL method to: (1) examine predictive skill when trained using noisy and uncertain data and (2) identify advantages of statistical learning implementations for prediction of water budget outcomes relative to process-based water budget calculations. Recharge was selected as the learning outcome because it is not observed and inherently uncertain. Precipitation, potential evapotranspiration (PET), and river discharge are the features, or inputs, and are calculated, or modelled, values and are not directly observed; consequently, they are expected to be noisy and uncertain because of contamination with measurement and model error. A common-sense baseline is developed and implemented to account for uncertainty and noise in outcomes for training and validation; the baseline provides delineation of a lower goodness-of-fit threshold that identifies when trained ML and DL models generate prediction skill and an upper goodness-of-fit threshold above which the models are learning to reproduce noise and bias. For statistical learning regression implementations, features and outcomes need to be transformed to be Gaussian-like. Inherent variability and extreme events in precipitation, discharge, and recharge data sets require power transformation, or at least scaling of logarithms, to enhance predictive skill. Identified advantages to statistical learning of water budget outcomes are the ability to use dimensionless trends for features and to represent a complex study site with the same level of effort as a simple site.

Comparison of Recharge Validation Performance by AI-ML method

Research Keywords

AI-ML

Common-sense Baseline

Deep Learning

Long Short-term Memory Network

Machine Learning

Overfitting

Power Transformation

Risk

Uncertainty Analysis

Water Budget or Water Balance

Keyword List

Article List