AI-ML
Artificial intelligence (AI) is the effort to automate intellectual tasks normally performed by humans. Machine learning (ML) and deep learning (DL) are types of AI and are statistical learning algorithms.
Bias–variance Tradeoff
Model training, or calibration, involves a tradeoff between prediction bias and variance controlled by model complexity. Increased model complexity decreases prediction bias, increases variance, and increases overfitting possibilities. Regularization techniques reduce model complexity, which generally increases prediction bias, decreases variance, and reduces likelihood for overfitting.
Circularity
Circularity provides a holistic focus on a complete and closed resource life cycle. A cycle does not have a terminal event involving 'disposal' and generation of 'waste'. Circularity is related to sustainability but is fundamentally different. Sustainability focuses on rationing and is thus zero sum while circularity focuses on regeneration of functional resources rather than consumption.
Common-sense Baseline
Because AI models tend to be difficult to interpret and prone to overfitting during training, it is important to use a common-sense baseline as part of model evaluation. The creation of a trivial common-sense baseline for initial confirmation of generalization ability is an important quality assurance and quality control (QA/QC) procedure prior to and during training.
Data Assimilation
Data assimilation (DA) is a collection of approaches for optimal combination of information from model simulations with observations. It uses a 'forward' model to make predictions. Measurements, or observed values, are combined with forward model predictions to derive updated values. The goal is to obtain the 'best' description of a dynamical system and inherent uncertainty with the updated values. It is generally used to (1) compute the best possible estimate of a model state at a given point in time and (2) implement the inverse-style estimation of model parameters or deduction of optimal model forcing, given all historic observations.
Deep Learning
Deep learning (DL) approaches are artificial neural network methods that can use multiple neuron layers and are deep in the sense of having more than one learning layer within the algorithm.
Deep Uncertainty
Uncertainty means lack of knowledge. Deep uncertainty extends the traditional uncertainty definition with two additional caveats: 1) inability to make well-informed future projections of a system, based on available data and understanding, and 2) inability to reduce this uncertainty, for near-term decision making, by gathering additional information. The concept of deep uncertainty may also include consideration of the stakeholders’ preferences regarding outcomes. Stakeholders’ preferences are politically motivated and seek to gain political power or economic advantages for their tribe. Consequently, the political power components of deep uncertainty are not traditionally included in engineering, scientific, or planning uncertainty.
Depth–averaged Shallow Water Equations
Set of equations including momentum and continuity components that represent fluid flow in streams, rivers, and shallow estuaries that are derived from the Reynolds–Averaged Navier Stokes (RANS) equations under the assumptions of hydrostatic pressure, a well–mixed water column, and a small depth–to–width ratio.
Ensemble Methods
Ensemble methods are a form of data assimilation (DA) that provides inverse–style model calibration, or training, by adjusting an ensemble of parameter realizations to create an ensemble of residuals to seek an approximate posterior parameter ensemble. Residuals are the differences between observations, or targets, and simulated values. PEST++ Iterative Ensemble Smoother (iES) toolset, PESTPP-IES, is an example of an ensemble method, and ensemble methods are relatively efficient computationally for large numbers of input parameters.
Human–induced Climate Change
Climate is the weather of a place averaged over an interval covering decades. Weather includes the daily events that occur in the atmosphere, and it changes across a much shorter period like minutes to weeks. Three-decade averages of weather measures, called Climate Normals, provide a place- and period-specific climate description. Anthropomorphic, or human-induced, drivers are pushing climate variations beyond the bounds of historical observations. Across the planet, temperatures are expected to increase by more than 1.5 ˚C (2.7 ˚F) between 2030 and 2052 relative to pre-industrial levels. These human-induced climate variations, with variability beyond that observed historically, are human–induced climate change.
Integrated Hydrological Model
This type of model simulates the full hydrologic cycle in terrestrial environments where both surface and subsurface water flow need to be represented and dynamically linked.
Kalman Filter
The Kalman filter is a digital filter and data assimilation (DA) algorithm that provides 'best' estimates of system state. It recursively estimates state variables in a noisy, linear, dynamical system by leveraging a series of measurements, in conjunction with initial state predictions from a forward model, to generate estimates of unknown variables. It requires a linear model of system state and a Gaussian-like distribution of measurement errors. Its estimates, or updates, combine a model prediction with a measurement using a weighted average. More weight is allocated to estimates that have greater certainty. The result is generation of estimates that tend to be more accurate than estimates based on a single measurement or simulation. As part of the update process, the joint probability distribution over the variables for each assimilation time are estimated. The Kalman filter is widely used in many technical and quantitative fields and can often be implemented in real time.
Land Use and Land Cover Change
Land use and land cover (LULC) changes include modifications to land use and land cover as a result of economic activity and climate. Land cover change includes evolution or alteration in vegetation communities and soils. LULC change directly impacts the water budget because it influences transpiration, infiltration, and surface runoff.
Long Short-term Memory Network
Long short-term memory (LSTM) networks are a deep learning (DL) algorithm that employs sequences, or time series, as inputs, can produce sequences of predicted outputs, and can learn system dynamics because of time-related learning and prediction.
Machine Learning
A machine learning (ML) approach involves training a statistical algorithm, or 'machine', to 'learn' from data.
Model Representation Error
Model representation error accounts for different simulation model representations of reality from what is actually observed during a measurement. All models simplify continuous and infinite reality with discrete representations of the continuous. Consequently, all models generate representation error.
Null Space Sensitivity Analysis
Sensitivity is the variation in model solution values due to variability, or uncertainty, in one or more model input values. The null space is the region of model solution space for which changes to parameter values (during calibration or training) do not result in changes to the objective function that is minimized for training. Because parameter value change does not impact calibration and inverse-style selection of optimal parameter values, parameters in the null space can be set to any value (constrained by professional judgement), and the model will maintain calibration. Null space sensitivity analysis is an important precursor to model-related decision making.
Observation Error Model
Observation error models may be used in data assimilation (DA) to represent measurement, or observation, error and model representation error. Model representation error accounts for different representations by the measurements and the 'forward' numerical model used in the assimilation. The goal of an observation error model is to assist in optimizing the bias–variance tradeoff inherent to DA, model calibration, and AI training.
Overfitting
Overfitting is the difference between training and testing accuracy, represented by prediction error, and generalization accuracy, represented by generalization error. It occurs because the optimization of internal weights, structure, and parameters seeks the best performance on the training and testing components. Testing and prediction errors consistently decrease with increases in model complexity and will typically drop to zero if complexity is increased sufficiently. This consistent bias decrease with increased complexity occurs as the model learns to reproduce the measurement error and noise in the training data set by increasing the degrees of freedom in the representation. Zero training error means that the model is overfit to the training data set and will typically generalize poorly.
PEST++
PEST++ is a model independent tool set for enhancing decision making with environmental models. It includes tools for global sensitivity analysis; least-squares parameter estimation with integrated first-order, second-moment parameter and forecast uncertainty estimation; an iterative, localized ensemble smoother (PESTPP-IES); and a tool for management optimization under uncertainty.
Power Transformation
For effective statistical learning, or AI-ML training, individual features need to be transformed to be somewhat like standard normally distributed data with zero mean (𝜇) and unit variance (𝜎2 = 1.0 where 𝜎 is the standard deviation). This transformation is called standardization. Power transformation is an advanced standardization approach that seeks to map data from any input distribution shape to close to a Gaussian shape.
Probabilistic Risk Assessment
Probabilistic risk assessment (PRA) is a collection of techniques and methods to explicitly incorporate variability and uncertainty into risk analysis. PRA produces estimates of likelihoods for a range of consequence magnitudes and not a single answer. If un-addressed, variability and uncertainty result in misestimates (either underprediction or overprediction) of risk. PRA is especially good for 'future' risk because it provides likelihoods for consequence magnitudes and can produce cones of uncertainty for consequence magnitude.
Risk
Risk related to decision making regarding complex, engineered systems includes scenarios, likelihoods, and consequences. A scenario, which is a sequence of events, decisions, and failures, describes how adverse consequences could occur and facilities determination of the likelihood, or probability, for negative outcomes. Risk is then the probability for negative consequences which is implicitly conditioned on the scenarios evaluated. Risk management is the reduction in frequency, or likelihood, of adverse scenarios, or accidents. Incorporation of risk assessment to decision making requires that uncertainty be addressed and quantified through assignment of likelihoods or probabilities to consequences.
Saltwater Disposal Well
A saltwater disposal well (SDW) is a Class II UIC disposal well. Produced water is a byproduct of natural gas and oil production. This water is heavily polluted with salt, hydrocarbons, and industrial compounds, making it hazardous to the environment. A SDW injects the produced water, colloquially called salt water, deep into the ground for segregation from underground sources of drinking water (USDW).
Semi–implicit
The time integration algorithm in a computer model that simulates fluid movement in rivers, streams, estuaries, reservoirs, and oceans is semi-implicit if the gravitational terms in the momentum equations and the velocity divergence in the continuity equation are treated implicitly while the remaining terms in the system of equations are treated explicitly.
Semi-Lagrangian
An Eulerian framework is a grid or Cartesian framework where offset and distance are measured in terms of length from a reference location. A Lagrangian framework is a reference frame that moves along with an object or packet of fluid. Consequently, offset and distance are measured in terms of travel time and time integration of velocity history. A semi-Lagrangian approach is one where the frame of reference moves with a moving fluid but where important parameters and environment descriptions are sampled from the current, or past, location of the fluid particle on a spatially fixed, or Eulerian, reference frame.
Severe Drought
Drought is deficiency of precipitation over an extended period of time resulting in water shortage. Severe drought according to the Standardized Precipitation Index (SPI) is cumulative precipitation depth that maps to an SPI of -1.5 or less and is in the lowest five percent of observed SPI values.
Sustainability
Sustainability is meeting the needs of the present without compromising the ability of future generations to meet their needs. Inherent in the concept of sustainability is rationed exploitation and communal management of resources to ensure that today's activities do not significantly jeopardize quality of life in subsequent decades.
Uncertainty Analysis
Uncertainty means lack of knowledge. Implementation of an uncertainty analysis is dependent on the type of uncertainty. For historical and design conditions, 'exact value' uncertainty is analyzed in terms of accuracy (or bias) and precision (or variability) as part of calibration and validation using data assimilation (DA) techniques, like the calibration–constrained uncertainty analyses provided by PEST and PEST++. For future conditions and planning uncertainty, probabilistic risk assessment (PRA) techniques can be used to estimate likelihoods, or probabilities, for adverse consequence magnitude.
Underground Injection Control
In the United States (US), injection wells are regulated by the underground injection control (UIC) program, administered by the U.S. Environmental Protection Agency (USEPA), to protect underground sources of drinking water (USDW) from endangerment by setting minimum requirements for injection well systems and the subsurface sequestration environment. The goal of UIC requirements is ensuring that injected waste stays within the design/target sequestration zone, does not directly or indirectly migrate to a USDW, does not cause a public water system to violate drinking water standards, and does not adversely affect public health in some other manner. UIC Class II wells are used to inject fluids related to oil and gas (OG) exploration and production (E&P). Primary uses of Class II wells are enhanced recovery of OG and disposal of wastewater co-produced with OG.
Water Budget or Water Balance
Procedural, or accounting–style, calculation that estimates the balance between storage in, water inputs to, and water outflows from a closed system. When the water budget is calculated for a watershed, incoming water is from precipitation, and the processes of evapotranspiration (ET), stream flow, and groundwater recharge generate outflows.
Weather Attribution
Weather attribution is the determination of the relative likelihood or probability of a weather event occurring under two different climate descriptions. It has been used to analyze changes in likelihood, relative to undisturbed early 20th century conditions, for an observed drought or precipitation event under present-day human-induced climate change conditions.
Weather Generator
A statistical model that simulates daily weather sequences designed to represent key statistical properties of meteorological and climate records. It provides a way to synthetically produce weather sequences that mimic a historical, or projected future, climate description.