Model Representation Error
Model representation error accounts for different simulation model representations of reality from what is actually observed during a measurement. All models simplify continuous and infinite reality with discrete representations of the continuous. Consequently, all models generate representation error.
Assimilating Complex Models with Indirect Observations under Data Insufficiency
Data assimilation (DA) provides optimal combination of model simulation results with observed values. There are four sources of uncertainty for any DA: 1) inherent uncertainty from limitations of scientific knowledge, 2) data insufficiency, which is insufficient information content in target observations for history matching-constrained parameter selection, 3) observation or measurement error, and 4) model representation error. Null space sensitivity analysis is a technique to examine data sufficiency. Sensitivity is the variation in model solution values due to variability, or uncertainty, in one or more parameter values. Parameters are in the null space when their variation causes minimal change to history matching skill during assimilation. As a result, null space parameters can be set to any value, constrained only by professional judgement, to produce a best fit model. Null space parameters that generate significant changes to important model predictions are however diagnostic of data insufficiency. We present a new null space sensitivity analysis for the iterative ensemble smoother (iES) algorithm, which provides an ensemble method for DA, in PEST++. A fundamental advantage of iES is computational efficiency through efficient and empirical sampling of posterior parameter distributions. Our new method leverages uncertainty analysis post-assimilation rather than robust Monte Carlo sampling, which is computationally expensive, to determine empirical parameter sensitivity and maintain the computational advantages of iES. Sensitivity analysis is generated by an ensemble of models with insensitive parameters varying across the feasible range of parameter values and sensitive parameters fixed to best-fit model values. The case study application of the null space sensitivity analysis identified data insufficiency leading to limited decision support regarding the amount of groundwater storage in the system, and it demonstrated a more than 97% reduction in computational requirements relative to the Null Space Monte Carlo (NSMC) method.
AI-ML Data Uncertainty Risks and Risk Mitigation Using Data Assimilation in Water Resources Management
Artificial intelligence (AI), including machine learning (ML) and deep learning (DL), learns by training and is restricted by the amount and quality of training data. The primary AI-ML risk in water resources is that uncertain data sets will hinder statistical learning to the point where the trained AI will provide spurious predictions and thus limited decision support. Overfitting is a significantly smaller prediction error during training relative to trained model generalization error for an independent validation set (that was not part of training). Training, or statistical learning, involves a tradeoff (the bias–variance tradeoff) between prediction error (or bias) and prediction variability (or variance) which is controlled by model complexity. Increased model complexity decreases prediction bias, increases variance, and increases overfitting possibilities. In contrast, decreased complexity increases prediction error, decreases prediction variability, and reduces tendencies toward overfitting. Better data are the way to make better AI–ML models. With uncertain water resource data sets, there is no quick way to generate improved data. Fortunately, data assimilation (DA) can provide mitigation for data uncertainty risks. The mitigation of uncertain data risks using DA involves a modified bias–variance tradeoff that focuses on increasing solution variability at the expense of increased model bias. Conceptually, increased variability should represent the amount of data and model uncertainty. Uncertainty propagation then produces an ensemble of models and a range of predictions with the target amount of extra variability.
Dynamic Integration of AI-ML Predictions with Process-Based Model Simulations
Data assimilation (DA) is used to integrate artificial intelligence including machine learning (AI-ML) and process-based models to produce a dynamic operational water balance tool for groundwater management. The management tool is a three-step calculation. In the first step, a traditional process-based water budget model provides forward model predictions of aquifer storage from meteorological observations, estimates of pumping and diversion discharge, and estimates of recharge. A Kalman filter-based DA approach is the second step and generates updated storage volumes by combining a trained AI-ML model, providing replacement 'measurements' for missing observations, with forward model predictions. The third 'correction' step uses modified recharge and pumping, adjusted to account for the difference between Kalman update storage and forward model predicted storage, in forward model re-simulation to approximate updated storage volume. Use of modified inputs in the correction provides a mass conservative water budget framework based on AI-ML predictions. Pumping and recharge values are uncertain and unobserved in the study region and can be adjusted without contradicting measurements.
An Observation Error Model for River Discharge Observed at Gauging Stations
Data assimilation (DA) makes the best combination of model simulation results and observed, or measured, values. Ensemble methods are a form of DA that generates multiple equally good, or equally calibrated, models using a description of model and observation uncertainty. Uncertainty is lack of knowledge. The collection of equally good, in the presence of uncertainty, models is an ensemble of models. An observation error model provides the means to describe the amount of uncertainty in model simulation results and in observed values as part of assimilation. Model-related uncertainty comes from model representation limitations created by differences between what the model represents, or simulates, and by what is measured to make an observation. Observation uncertainty comes from observation error. When an observed value is calculated or estimated, rather than measured, additional uncertainty is generated by the estimation procedure.
An observation error model is developed and presented for river discharge observations made at a stream gauging station using a measured water depth value with a derived rating curve to calculate discharge from observed water depth. A rating curve is a poor hydrodynamics model. Consequently, large estimation errors are expected for river discharge calculated using a rating curve, which generates correspondingly large amounts of observation uncertainty for assimilation. Uncertainty is propagated through DA to the spread, or variability, of model outcomes provided by the ensemble of models. When assimilating simulation results and data with significant uncertainty, the goal of assimilation is to optimize the bias-variance tradeoff and thus the spread of ensemble outcomes. Optimizing this tradeoff involves limiting the amount of uncertainty as much as possible to make informed decisions while including sufficient uncertainty to avoid overfitting. The risk from overfitting is production of biased model outcomes and spurious decision support.
Using Weather Attribution for Robust Representation of Present and Future Extreme Weather Events
Weather attribution estimates the current and near future likelihood for a recently observed extreme weather event, like a drought or hurricane. It uses climate models, weather prediction models, and observed weather to determine how much more likely the observed event is today relative to the recent past, like the 1990s and 2000s. In this study, a statistical weather generator (WG) creates synthetic sequences of future precipitation, temperature, and potential evapotranspiration that represent the increased likelihood for three-month severe drought. An independent weather attribution study identified that three-month severe drought is five times more likely to occur today relative to recent historical conditions. The WG-simulated conditions portray a near future where historical extreme and severe drought are significantly more likely to occur. The climate description produced by this WG is representative of the weather attribution study and is significantly hotter, with lower expected soil moisture than the future climate description obtained from global circulation, i.e., climate, model (GCM) simulation results (by themselves).
Particle Tracking for Transport Simulation
One way to determine the fluid velocity, or concentration, at a time and point in space is to trace a particle pathline backwards through space and time to the starting location at the previous solution time. The departure point is the beginning point for the pathline from the previous solution time. Because the spatial distribution, or field for the constituent which moves with the fluid, is known at the previous solution time, the value for the previous solution time can be determined using spatial interpolation from the known values at the departure point. A new way to determine departure point location, called the semi-analytical upwind path line tracing (SUT) method, is presented that uses a semi-analytical solution for particle tracking rather than a discrete numerical solution like the Euler and Runge-Kutta methods. The semi-analytical solution provides a way to move entirely across a cell in one calculation while the discrete methods must divide the calculation into small pathline segments, or sub-calculations, for accuracy. Consequently, the SUT method has equivalent accuracy to discrete numerical solution approaches and can provide significantly improved computational efficiency for relatively long time step durations.