{
"cells": [
{
"cell_type": "markdown",
"id": "3cc6eb88",
"metadata": {},
"source": [
"# Application of the ensemble Kalman filter (EnKF) to Lorenz's 1963 model\n",
"\n",
"$$\n",
"\\newcommand{\\myd}{\\mathrm{d}}\n",
"\\newcommand{\\statev}{\\mathbf{X}}\n",
"\\newcommand{\\lrayleigh}{{\\mathrm{r}}}\n",
"\\newcommand{\\rayleigh}{{\\mathrm{Ra}}}\n",
"\\newcommand{\\pr}{\\mathrm{Pr}}\n",
"\\newcommand{\\adj}{T}\n",
"\\newcommand{\\tstep}{\\Delta t}\n",
"$$\n",
"\n",
"The goal now is to get familiar with the working of a sequential assimilation scheme, running a so called twin experiment: A true, reference model trajectory $\\statev_i^t$ is generated ($t$ stands for true, $i$ for the discrete time index). The reference trajectory is used to construct a catalog of synthetic observations. These observations are then assimilated in order to correct a second model trajectory, which differs from the first one (the true one). In our case it will differ because we will assume a different initial condition, $\\statev_0 \\neq \\statev_0^t$.\n",
"\n",
"Twin experiments (also called OSSE, Observing System Assimilation Experiments) are a logical first step when implementing an assimilation scheme, since they allow to develop an understanding for the behaviour of the scheme, without the additional complexity which may arise from the inability of the forward model to represent some of the physics expressed in the observations.\n",
"\n",
"Today we will run these twin experiments using Lorenz's 1963 model, and we will resort to the ensemble Kalman filter, which is a standard sequential assimilation scheme. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d4286a9d",
"metadata": {},
"outputs": [],
"source": [
"## Uncomment this line to make it interactive in JupyterLab!\n",
"#%matplotlib widget\n",
"\n",
"import numpy as np\n",
"import forwardModel as fw\n",
"import matplotlib.pyplot as plt\n",
"from myEnKF import myEnKF"
]
},
{
"cell_type": "markdown",
"id": "c336e324",
"metadata": {},
"source": [
"## Definition of the reference trajectory"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "12b48cc7",
"metadata": {},
"outputs": [],
"source": [
"# We define the control parameters here \n",
"rayleigh = 35 # \n",
"prandtl = 10.\n",
"b = 8./3.\n",
"\n",
"#initial condition for the true reference trajectory\n",
"x0 = np.array( [0., 1., 2.], dtype=float )\n",
"\n",
"#integration time parameter\n",
"dt = 1.e-3 # This is time step size\n",
"T = 25. # Total integration time, can be as short as 10 to speed things up\n",
"n_steps = int( np.ceil( T / dt) )\n",
"time = np.linspace(0., T, n_steps + 1, endpoint=True) # array of discrete times\n",
"\n",
"#numerical integration given initial conditions and control parameters\n",
"xt = fw.forwardModel_r( x0, time, rayleigh, prandtl, b) "
]
},
{
"cell_type": "markdown",
"id": "c228fb0d",
"metadata": {},
"source": [
"## Creation of the catalog of synthetic observations \n",
"\n",
"This is where we specify which variables are observed (we can decide that 1, 2 or even all 3 of them are measured), how often, and how accurate these observations are. To this end, we add an observation noise to the true value of the field of interest. This noise is assumed to be Gaussian, with standard deviation $\\sigma_{obs}$. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b831492a",
"metadata": {},
"outputs": [],
"source": [
"## How often do we observe the true state? \n",
"dtobs = 0.5 # time between observations\n",
"# Which variables do we observe? \n",
"WhichVariablesAreObserved = np.array( [1, 1, 1], dtype=float )\n",
" # Determines which variables are available to\n",
" # the EnKF. For example:\n",
" # WhichVariablesAreObserved = [1 1 1]; \n",
" # means: X, Y, Z are observed\n",
" # WhichVariablesAreObserved = [1 0 1];\n",
" # means: X and Z are observed\n",
" # WhichVariablesAreObserved = [1 0 0];\n",
" # means: X is observed\n",
"sigobs = 2. # standard deviation of the observation noise\n",
"# We generate the synthetic data \n",
"# Construct observation matrix H\n",
"# ........................................................................\n",
"H = np.diag( WhichVariablesAreObserved )\n",
"\n",
"y_size = int( np.sum( WhichVariablesAreObserved ) )\n",
"H = np.zeros( (y_size, 3), dtype=float)\n",
"iy = 0\n",
"for ix in range(3):\n",
" if WhichVariablesAreObserved[ix] > 0:\n",
" H[iy,ix] = 1.\n",
" iy = iy + 1\n",
"\n",
"nobs = int( np.ceil( T / dtobs ) ) - 1 # number of times observations are performed\n",
" # no observation at t=0 \n",
"gap = int( dtobs/dt )# number of time steps between each observation \n",
"time_obs = time[gap::gap]\n",
"# Generate vector of observations\n",
"y = np.zeros( (y_size, nobs), dtype=float)\n",
"R = np.diag( np.tile(sigobs**2, y_size) )\n",
"sqrt_s = np.sqrt(R)\n",
"# y = Hxt \n",
"y = np.dot( H, xt[:,gap::gap] )\n",
"# compute observation error \n",
"noise = np.dot(sqrt_s , np.random.normal( loc=0., scale=1.0, size=np.shape(y) ) )\n",
"# y = Hxt + epsilon \n",
"y = y + noise"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "99646fbf",
"metadata": {},
"outputs": [],
"source": [
"# Plot reference trajectory and the observations that will be fed to the EnKF \n",
"fig, ax = plt.subplots(nrows=3, ncols=1, sharex=True)\n",
"iobs = 0\n",
"for k, comp in enumerate ([\"X\",\"Y\",\"Z\"]):\n",
" ax[k].plot(time, xt[k,:])\n",
" if WhichVariablesAreObserved[k] > 0:\n",
" ax[k].errorbar(time_obs, y[iobs,:], yerr=sqrt_s[iobs,iobs], fmt='o', markersize=3, capsize=4, label='obs')\n",
" ax[k].legend(bbox_to_anchor=(1.01, 1),loc='upper right', frameon=True)\n",
" iobs = iobs + 1\n",
" ax[k].set_ylabel(comp)\n",
"ax[-1].set_xlabel('Time')\n",
"ax[-1].set_xlim(time[0],time[-1])\n",
"ax[0].set_title('Reference trajectory + observations')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "e56d02b1",
"metadata": {},
"source": [
"## Application of the Ensemble Kalman Filter\n",
"\n",
"We will now assimilate these observations into a model whose trajectory will be hopefully drawn towards the true one, trough the sequential assimilation of data. \n",
"\n",
"The ensemble, which is assumed to be centered on a zero initial condition, is characterized by \n",
"\n",
"+ its size ($N_e$ is the number of elements of the ensemble), \n",
"+ its initial spread that is also assumed to follow Gaussian statistics, with a standard deviation denoted by $\\sigma_{ens}$\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "94dcb74f",
"metadata": {},
"outputs": [],
"source": [
"# Initialization of the Ensemble\n",
"Ne = 50 # Number of ensemble members \n",
"x0ens = np.array( [0.,0.,0.], dtype=float ) # The ensemble is centered on a zero initial condition\n",
"sigens = 10 # standard deviation of the ensemble\n",
"# initial covariance matrix\n",
"P0 =(sigens**2)*np.eye(3, dtype=float)"
]
},
{
"cell_type": "markdown",
"id": "c1ef1b83",
"metadata": {},
"source": [
"We have all the ingredients we need (physical model, observations, their statistics, initial ensemble) to apply the EnKF. \n",
"The python script `myEnKF.py` contains the implementation of the EnKF. It returns two numpy arrays:\n",
"+ `xEnKF`, which is the trajectory of the ensemble average - our guess of the trajectory\n",
"+ `x_ens`, which contains the individual trajectories of the $N_e$ ensemble members"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c92d209",
"metadata": {},
"outputs": [],
"source": [
"#Now we run the EnKF \n",
"xEnKF, x_ens = myEnKF( n_steps, dt, nobs, time_obs, gap, Ne, H, R, y, x0ens, P0, rayleigh, prandtl, b)"
]
},
{
"cell_type": "markdown",
"id": "288d3568",
"metadata": {},
"source": [
"Since we are dealing with a twin experiment, we know what the true dynamical trajectory is, and we can assess exactly how good the EnKF is performing, under the conditions that were prescribed above. \n",
"\n",
"The python code below is used to quantify the error, for the $X$, $Y$ and $Z$ components of the state vector considered separately, and the 3 components taken together. \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2c2b9510",
"metadata": {},
"outputs": [],
"source": [
"# Compute Euclidean error wrt true state which is known in this synthetic game\n",
"# cumulative error\n",
"cum_error_comp = np.sqrt( np.sum( (xt - xEnKF)**2, axis=1) ) \n",
"cum_error = np.sqrt( np.sum( cum_error_comp**2) )\n",
"norm_comp = np.sqrt( np.sum( xt**2, axis=1) ) \n",
"norm = np.sqrt( np.sum( norm_comp**2) )\n",
"print()\n",
"print(\" Relative errors in %\")\n",
"for k, comp in enumerate ([\"X\",\"Y\",\"Z\"]):\n",
" print(\" \"+(comp+\"-component: %4.1f \") % ( 100.*cum_error_comp[k]/norm_comp[k]) )\n",
"print()\n",
"print((\" 3-component: %4.1f \") % ( 100.*cum_error / norm ) )"
]
},
{
"cell_type": "markdown",
"id": "ffee314e",
"metadata": {},
"source": [
"We plot the results of the assimilation with the EnKF. For each component, the plots comprise the reference trajectory, the observations used for assimilation, and the EnKF estimate of the trajectory (the ensemble average). \n",
"The legend also features the relative erros that were just computed. \n",
"\n",
"By setting the plot_ensemble variable to True, you have the possibility to visualize individual trajectories as well (thin black lines). "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6804944a",
"metadata": {},
"outputs": [],
"source": [
"#Plot results\n",
"plot_ensemble = False # Change to True if you want to visualize ensemble members\n",
"fig, ax = plt.subplots(nrows=3, ncols=1, sharex=True)\n",
"iobs = 0\n",
"for k, comp in enumerate ([\"X\",\"Y\",\"Z\"]):\n",
" ax[k].plot(time, xt[k,:], label='true')\n",
" ax[k].plot(time, xEnKF[k,:], label='EnKF')\n",
" if plot_ensemble is True:\n",
" for e in range(Ne):\n",
" ax[k].plot(time, x_ens[k,:,e], c='k', lw=0.1, zorder=-4)\n",
" if WhichVariablesAreObserved[k] > 0:\n",
" ax[k].errorbar(time_obs, y[iobs,:], yerr=sqrt_s[iobs,iobs], fmt='o', markersize=3, capsize=4, label='obs')\n",
" iobs = iobs + 1\n",
" ax[k].legend(bbox_to_anchor=(0.9, 1),loc='upper left', frameon=True)\n",
" ax[k].set_ylabel(comp)\n",
" ax[k].title.set_text( (\"Error for \"+comp+\"-component: %4.1f %% \") % ( 100.*cum_error_comp[k]/norm_comp[k]) )\n",
"ax[-1].set_xlabel('Time')\n",
"ax[-1].set_xlim(time[0],time[-1])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "8f5cf581",
"metadata": {},
"source": [
"## Some questions \n",
"\n",
"*To address those questions, note that you must rerun the cells above as soon as you vary one parameter. \n",
"Easiest way to do it is probably to use the 'Restart kernel and run all cells' in the Kernel tab.*\n",
"\n",
"Q1: All other parameters remaining constant, find the maximum value of $\\sigma_{obs}$ which leads to an acceptable behaviour of the EnKF (after you define what you consider an acceptable behaviour of the EnKF). One significant digit is enough. "
]
},
{
"cell_type": "raw",
"id": "644ed034",
"metadata": {},
"source": [
"# your answer here below"
]
},
{
"cell_type": "markdown",
"id": "5b653641",
"metadata": {},
"source": [
"Q2: Now with all other parameters remaining constant (σobs being equal to the value you just found),find the largest value of $dt_{obs}$ (with 0.2 accuracy, say) which leads to an acceptable behaviour of the EnKF."
]
},
{
"cell_type": "raw",
"id": "b316a363",
"metadata": {},
"source": [
"# your answer here below\n"
]
},
{
"cell_type": "markdown",
"id": "94205034",
"metadata": {},
"source": [
"Q3: Is there a connection between this value and the typical time scale of the dynamics of the model?"
]
},
{
"cell_type": "raw",
"id": "9088b434",
"metadata": {},
"source": [
"# your answer here below \n"
]
},
{
"cell_type": "markdown",
"id": "e20f4972",
"metadata": {},
"source": [
"Q4: Set $dt_{obs}$ to half the maximum value you just found. Keeping all the other parameter constant, assume now that only X or Y is observed. Comment on the quality of the EnKF estimate based on either option."
]
},
{
"cell_type": "raw",
"id": "bcd9cd48",
"metadata": {},
"source": [
"# your answer here below\n"
]
},
{
"cell_type": "markdown",
"id": "4d2f6d78",
"metadata": {},
"source": [
"Q5: It may be that the quality of the results depend strongly on the variable which is observed\n",
"(X or Y). Would you have an explanation for this, based on a simple analysis of the equations \n",
"that govern the dynamics of the L63 model?"
]
},
{
"cell_type": "raw",
"id": "68991aeb",
"metadata": {},
"source": [
"# your answer here below\n"
]
},
{
"cell_type": "markdown",
"id": "a499b6f6",
"metadata": {},
"source": [
"Q6: All other parameters remaining constant, find the minimum number of elements of the ensemble $N_e$ which leads to a good behaviour of the EnKF."
]
},
{
"cell_type": "raw",
"id": "92441796",
"metadata": {},
"source": [
"# your answer here below\n"
]
},
{
"cell_type": "markdown",
"id": "dbe824c4",
"metadata": {},
"source": [
"Q7: What conclusions do you draw from practising from the EnKF?"
]
},
{
"cell_type": "raw",
"id": "9ac314dc",
"metadata": {},
"source": [
"# your answer here below"
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"main_language": "python",
"notebook_metadata_filter": "-all"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}