Power Transformation

For effective statistical learning, or AI-ML training, individual features need to be transformed to be somewhat like standard normally distributed data with zero mean (𝜇) and unit variance (𝜎2 = 1.0 where 𝜎 is the standard deviation). This transformation is called standardization. Power transformation is an advanced standardization approach that seeks to map data from any input distribution shape to close to a Gaussian shape.

AI-ML Accounting for Uncertain Water Resources Data Sets

Artificial intelligence (AI) is the effort to automate intellectual tasks normally performed by humans, and it includes machine learning (ML) and deep learning (DL) approaches. AI-ML is based on statistical learning, which tries to learn statistics-based rules for data analyses from known examples of inputs and corresponding outcomes. Data sets that are noisy, include significant uncertainty, and have extreme values hinder statistical learning. ML and DL aquifer recharge predictors are developed to: (1) examine prediction skill when trained using noisy and uncertain data and (2) identify advantages of AI-ML relative to traditional physics- and process-based calculations. Recharge was selected as the learning outcome because it is not observed and is inherently uncertain. A common-sense baseline is developed and implemented to account for uncertainty and noise in AI-ML predictions. The baseline provides a lower goodness-of-fit threshold that identifies when trained AI-ML generates prediction skill and an upper goodness-of-fit threshold above which the AI-ML is learning to reproduce noise and bias in the training data set (and is likely overfitting). Identified advantages for AI-ML (relative to physics- or process-based calculations) are the ability to use dimensionless trends for features and to represent a complex scenario with the same level of effort as for a simple case.