Assimilating Complex Models with Indirect Observations under Data Insufficiency
Data assimilation (DA) provides optimal combination of model simulation results with observed values. There are four sources of uncertainty for any DA: 1) inherent uncertainty from limitations of scientific knowledge, 2) data insufficiency, which is insufficient information content in target observations for history matching-constrained parameter selection, 3) observation or measurement error, and 4) model representation error. Null space sensitivity analysis is a technique to examine data sufficiency. Sensitivity is the variation in model solution values due to variability, or uncertainty, in one or more parameter values. Parameters are in the null space when their variation causes minimal change to history matching skill during assimilation. As a result, null space parameters can be set to any value, constrained only by professional judgement, to produce a best fit model. Null space parameters that generate significant changes to important model predictions are however diagnostic of data insufficiency. We present a new null space sensitivity analysis for the iterative ensemble smoother (iES) algorithm, which provides an ensemble method for DA, in PEST++. A fundamental advantage of iES is computational efficiency through efficient and empirical sampling of posterior parameter distributions. Our new method leverages uncertainty analysis post-assimilation rather than robust Monte Carlo sampling, which is computationally expensive, to determine empirical parameter sensitivity and maintain the computational advantages of iES. Sensitivity analysis is generated by an ensemble of models with insensitive parameters varying across the feasible range of parameter values and sensitive parameters fixed to best-fit model values. The case study application of the null space sensitivity analysis identified data insufficiency leading to limited decision support regarding the amount of groundwater storage in the system, and it demonstrated a more than 97% reduction in computational requirements relative to the Null Space Monte Carlo (NSMC) method.
Sustainable Water Resource Management: A Future Flood Inundation Example
Sustainability is meeting the needs of the present without jeopardizing quality of life for future generations. Adaptation is adjustment of resource utilization and planning by current generations to ensure sustainability. Mitigation, for this study, narrowly refers to damage repair and restoration costs incurred after natural hazard occurrence. Climate is dynamic and ever changing. Recent observed changes in weather patterns identify that drought and intense precipitation, leading to flooding, are more likely to occur in the near future. An example dynamic probabilistic risk assessment (PRA) for flood inundation is created and applied to understand benefits to, and limitations on, PRA for sustainable water resource management. This example addresses the issue of sustainable decision making related to outdated, but historically regulatory compliant, infrastructure. The observed increase in likelihood for large floods means that many assets were designed for inapplicable conditions and are more likely to be damaged in the future. Results from this example PRA demonstrate that it provides for optimizing the degree of sustainability included in resource management and decision making. Sustainability optimization is obtained by balancing likelihood for future mitigation costs against potential cost savings garnered from present-day adaptation.
AI-ML Data Uncertainty Risks and Risk Mitigation Using Data Assimilation in Water Resources Management
Artificial intelligence (AI), including machine learning (ML) and deep learning (DL), learns by training and is restricted by the amount and quality of training data. The primary AI-ML risk in water resources is that uncertain data sets will hinder statistical learning to the point where the trained AI will provide spurious predictions and thus limited decision support. Overfitting is a significantly smaller prediction error during training relative to trained model generalization error for an independent validation set (that was not part of training). Training, or statistical learning, involves a tradeoff (the bias–variance tradeoff) between prediction error (or bias) and prediction variability (or variance) which is controlled by model complexity. Increased model complexity decreases prediction bias, increases variance, and increases overfitting possibilities. In contrast, decreased complexity increases prediction error, decreases prediction variability, and reduces tendencies toward overfitting. Better data are the way to make better AI–ML models. With uncertain water resource data sets, there is no quick way to generate improved data. Fortunately, data assimilation (DA) can provide mitigation for data uncertainty risks. The mitigation of uncertain data risks using DA involves a modified bias–variance tradeoff that focuses on increasing solution variability at the expense of increased model bias. Conceptually, increased variability should represent the amount of data and model uncertainty. Uncertainty propagation then produces an ensemble of models and a range of predictions with the target amount of extra variability.
An Observation Error Model for River Discharge Observed at Gauging Stations
Data assimilation (DA) makes the best combination of model simulation results and observed, or measured, values. Ensemble methods are a form of DA that generates multiple equally good, or equally calibrated, models using a description of model and observation uncertainty. Uncertainty is lack of knowledge. The collection of equally good, in the presence of uncertainty, models is an ensemble of models. An observation error model provides the means to describe the amount of uncertainty in model simulation results and in observed values as part of assimilation. Model-related uncertainty comes from model representation limitations created by differences between what the model represents, or simulates, and by what is measured to make an observation. Observation uncertainty comes from observation error. When an observed value is calculated or estimated, rather than measured, additional uncertainty is generated by the estimation procedure.
An observation error model is developed and presented for river discharge observations made at a stream gauging station using a measured water depth value with a derived rating curve to calculate discharge from observed water depth. A rating curve is a poor hydrodynamics model. Consequently, large estimation errors are expected for river discharge calculated using a rating curve, which generates correspondingly large amounts of observation uncertainty for assimilation. Uncertainty is propagated through DA to the spread, or variability, of model outcomes provided by the ensemble of models. When assimilating simulation results and data with significant uncertainty, the goal of assimilation is to optimize the bias-variance tradeoff and thus the spread of ensemble outcomes. Optimizing this tradeoff involves limiting the amount of uncertainty as much as possible to make informed decisions while including sufficient uncertainty to avoid overfitting. The risk from overfitting is production of biased model outcomes and spurious decision support.
Dynamic Integration of AI-ML Predictions with Process-Based Model Simulations
Data assimilation (DA) is used to integrate artificial intelligence including machine learning (AI-ML) and process-based models to produce a dynamic operational water balance tool for groundwater management. The management tool is a three-step calculation. In the first step, a traditional process-based water budget model provides forward model predictions of aquifer storage from meteorological observations, estimates of pumping and diversion discharge, and estimates of recharge. A Kalman filter-based DA approach is the second step and generates updated storage volumes by combining a trained AI-ML model, providing replacement 'measurements' for missing observations, with forward model predictions. The third 'correction' step uses modified recharge and pumping, adjusted to account for the difference between Kalman update storage and forward model predicted storage, in forward model re-simulation to approximate updated storage volume. Use of modified inputs in the correction provides a mass conservative water budget framework based on AI-ML predictions. Pumping and recharge values are uncertain and unobserved in the study region and can be adjusted without contradicting measurements.
AI-ML Accounting for Uncertain Water Resources Data Sets
Artificial intelligence (AI) is the effort to automate intellectual tasks normally performed by humans, and it includes machine learning (ML) and deep learning (DL) approaches. AI-ML is based on statistical learning, which tries to learn statistics-based rules for data analyses from known examples of inputs and corresponding outcomes. Data sets that are noisy, include significant uncertainty, and have extreme values hinder statistical learning. ML and DL aquifer recharge predictors are developed to: (1) examine prediction skill when trained using noisy and uncertain data and (2) identify advantages of AI-ML relative to traditional physics- and process-based calculations. Recharge was selected as the learning outcome because it is not observed and is inherently uncertain. A common-sense baseline is developed and implemented to account for uncertainty and noise in AI-ML predictions. The baseline provides a lower goodness-of-fit threshold that identifies when trained AI-ML generates prediction skill and an upper goodness-of-fit threshold above which the AI-ML is learning to reproduce noise and bias in the training data set (and is likely overfitting). Identified advantages for AI-ML (relative to physics- or process-based calculations) are the ability to use dimensionless trends for features and to represent a complex scenario with the same level of effort as for a simple case.
Using Weather Attribution for Robust Representation of Present and Future Extreme Weather Events
Weather attribution estimates the current and near future likelihood for a recently observed extreme weather event, like a drought or hurricane. It uses climate models, weather prediction models, and observed weather to determine how much more likely the observed event is today relative to the recent past, like the 1990s and 2000s. In this study, a statistical weather generator (WG) creates synthetic sequences of future precipitation, temperature, and potential evapotranspiration that represent the increased likelihood for three-month severe drought. An independent weather attribution study identified that three-month severe drought is five times more likely to occur today relative to recent historical conditions. The WG-simulated conditions portray a near future where historical extreme and severe drought are significantly more likely to occur. The climate description produced by this WG is representative of the weather attribution study and is significantly hotter, with lower expected soil moisture than the future climate description obtained from global circulation, i.e., climate, model (GCM) simulation results (by themselves).
Collocating Saltwater Disposal Wells (SDWs) and Legacy Oil and Gas (O&G) is a Bad Idea
Placement of a saltwater disposal well (SDW) within the footprint of a mature oil and gas (O&G) exploration and production region is a bad idea. A long history of O&G exploration means many deep wells (> 5,000 ft below ground surface, bgs) have been installed, and many of these wells were installed prior to modern construction standards, permitting requirements, and data tracking capabilities. Consequently, there are likely to be many poorly constructed and unplugged deep wells whose locations have been forgotten. The purpose of deep well disposal is to segregate the disposal fluids, i.e., harmful waste, from the environment which includes underground sources of drinking water (USDW). Sequestration and containment of harmful wastes is eliminated when there are unknown deep and improperly abandoned wells that pierce containment. This issue of lack of adequate confinement for deep waste disposal is common in Texas because of the prevalence of legacy O&G fields and relatively relaxed permitting requirements for SDWs. This paper demonstrates that locating Texas Class II disposal wells (SDWs) and O&G activities within the same area increases waste containment failure likelihood by 2 times relative to generic SDWs in other states and 100 times relative to Class I hazardous waste (Class IH) injection well systems.
Projecting Climate Change Impacts to Watershed Water Resources
A methodology is presented for predicting impacts and risks to water resources, at the watershed scale, from somewhat unknown future climate. It is then applied to estimate impacts to a semi-arid watershed in Texas. Because all models of water movement and storage in watersheds provide estimates (and best guesses), rather than absolute answers, and because the specifics of future weather are unknown, this approach uses likelihoods (or probabilities) for relative change in magnitude, ∆, between future and historical precipitation, evapotranspiration, storm runoff, and aquifer recharge to evaluate future risk to water availability. Projected (future) climate trends for the study site from climate models are a 3 ˚C increase in average temperature, which means that the potential for evapotranspiration will increase, no significant change in average annual precipitation, which means that there generally will not be more water available for evaporation, and a semi-arid classification from 2011–2100. Future precipitation is projected as unchanged for typical conditions. Consequently, no significant change is estimated for evapotranspiration, runoff, or recharge for average conditions. With expectations for significant temperature increase, an increase in the amount of rainfall is needed to increase evapotranspiration, runoff, and recharge. Increases in rainfall during infrequent large storms are included in the analysis for future conditions, which produces increased water availability during infrequent extreme events but does not change expectations for average conditions.
Estimating Combined Climate Change and Land Use/Land Cover Change Impacts on Water Resources
Climate change and changes to land use and land cover (LULC) both impact water resources, and they have interacting influences on the amount of water available for management and consumption. The framework for the assessment of relative risk to watershed-scale water resources from systemic changes presented in 'Projecting Climate Change Impacts to Watershed Water Resources' is used again to predict combined climate and LULC change impacts from 2011–2100 for the same semi-arid watershed in Texas. In the application, an increase in impervious area from economic development is the LULC change. It generates a 1.1 times increase in average water availability, relative to future climate trends, from increased runoff and decreased evapotranspiration.
Two-Dimensional (2D) River Flow and Inundation Simulation Model
MOD_FreeSurf2D is a generally applicable computer model to simulate water movement and depth in rivers, streams, and shallow estuaries. It uses MATLAB to implement well established semi-Lagrangian and semi-implicit numerical algorithms that solve the depth-averaged, shallow water equations. MOD_FreeSurf2D has been validated against a dam-break flume experiment and three-dimensional river velocity and depth observations at the reach scale. An advantage of MOD_FreeSurf2D is that it can explicitly find and simulate the moving land/water boundary during flooding and tidal surge from topography and bathymetry.
Particle Tracking for Transport Simulation
One way to determine the fluid velocity, or concentration, at a time and point in space is to trace a particle pathline backwards through space and time to the starting location at the previous solution time. The departure point is the beginning point for the pathline from the previous solution time. Because the spatial distribution, or field for the constituent which moves with the fluid, is known at the previous solution time, the value for the previous solution time can be determined using spatial interpolation from the known values at the departure point. A new way to determine departure point location, called the semi-analytical upwind path line tracing (SUT) method, is presented that uses a semi-analytical solution for particle tracking rather than a discrete numerical solution like the Euler and Runge-Kutta methods. The semi-analytical solution provides a way to move entirely across a cell in one calculation while the discrete methods must divide the calculation into small pathline segments, or sub-calculations, for accuracy. Consequently, the SUT method has equivalent accuracy to discrete numerical solution approaches and can provide significantly improved computational efficiency for relatively long time step durations.