{ "cells": [ { "cell_type": "markdown", "id": "3b3a4134", "metadata": {}, "source": [ "# K Index Calculation" ] }, { "cell_type": "code", "execution_count": null, "id": "2f9fdddc", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "# Install pooch (not currently in VRE)\n", "import sys\n", "!{sys.executable} -m pip install pooch\n", "\n", "import datetime as dt\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import pooch\n", "from viresclient import SwarmRequest\n", "import ipywidgets as widgets\n", "\n", "from warnings import filterwarnings\n", "filterwarnings(action=\"ignore\")\n", "\n", "# Data dependencies (pooch caches this in ~/.cache/pooch/)\n", "esk_k_ind_file = pooch.retrieve(\n", " \"https://raw.githubusercontent.com/MagneticEarth/IAGA_SummerSchool2019/master/data/external/k_inds/esk/2003.esk\",\n", " known_hash=\"233246e167a212cd1afa33ff2fe130fbc308cd2ae7971c6c2afcd363c9775c18\"\n", ")" ] }, { "cell_type": "markdown", "id": "c995e94d", "metadata": {}, "source": [ "## Calculating K-indices for a single observatory" ] }, { "cell_type": "markdown", "id": "7f85dff3", "metadata": {}, "source": [ "The K-index is a local geomagnetic activity index devised by Julius Bartels in 1938 to give a simple measure of the degree of geomagnetic disturbance during each 3-hour (UT) interval seen at a single magnetic observatory. Data from the observatory magnetometers are used to assign a number in the range 0-9 to each 3-hour interval, with K=0 indicating very little geomagnetic activity and K=9 representing an extreme geomagnetic storm. The K-index was introduced at the time of photographic recording, when magnetograms recorded variations in the horizontal geomagnetic field elements declination (D) and horizontal intensity (H), and in the vertical intensity (Z).\n", "\n", "To derive a K-index an observer would fit, __by eye__, a 'solar regular variation' ($S_R$) curve to the records of D and H and measure the range (maximum-minimum) of the deviation of the recording from the curve. The K-index was then assigned according to a conversion table, with the greatest range in D and H 'winning'. The north component (X) may be used instead of H, and the east component (Y) instead of D (X and Y will be used in the examples below and see http://isgi.unistra.fr/what_are_kindices.php for more details on the K-index). The vertical component Z is not used because it is liable to contamination by local induced currents.\n", "\n", "The conversion from range in nanoteslas to index is quasi-logarithmic. The conversion table varies with latitude in an attempt to normalise the K-index distribution for observatories at different latitudes. The table for Eskdalemuir is shown below.\n", "\n", "| K | 0 | 1 |2|3|4|5|6|7|8|9|\n", "| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |\n", "| Lower bound (nT) | 0 | 8 |15|30|60|105|180|300|500|750|\n", "\n", "This means that, for instance, K=2 if the measured range is in the interval \\[15, 30\\) nT.\n", "\n", "There was a long debate in IAGA Division V about algorithms that could adequately reproduce the K-indices that an experienced observer would assign. The algorithms and code approved by IAGA are available at the International Service for Geomagnetic Indices: http://isgi.unistra.fr/softwares.php." ] }, { "cell_type": "markdown", "id": "b6f91a4c", "metadata": {}, "source": [ "### Example" ] }, { "cell_type": "markdown", "id": "e29dfb61", "metadata": {}, "source": [ "In the following cells, we **illustrate** a possible approach. We assume the so-called regular daily variation $S_R$ is made up of 24h, 12h and 8h signals (and, possibly, higher harmonics). A Fourier analysis can be used to investigate this. The functions in the cell below calculate Fourier coefficients from a data sample of one-minute data values over a 24-hour UT interval, and then synthesise a smooth fit using the Fourier coefficients.\n", "\n", "For some days this simple approach to estimating $S_R$ seems to work well, on others it's obviously wrong. Think about another approach you might take.\n", "\n", "We then attempt to calculate K-indices for the day chosen by computing the Fourier series up the the number of harmonics selected by subtracting the synthetic harmonic signal from the data, then calculating 3-hr ranges and converting these into the corresponding K-index. The functions to do are also included in the following cell." ] }, { "cell_type": "code", "execution_count": null, "id": "8457dde2", "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "def fourier(v, nhar):\n", " npts = len(v)\n", " f = 2.0/npts\n", " t = np.linspace(0, npts, npts, endpoint=False)*2*np.pi/npts\n", " vmn = np.mean(v)\n", " v = v - vmn\n", " cofs = [0]*(nhar+1)\n", " cofs[0] = (vmn,0)\n", " for i in range(1,nhar+1):\n", " c, s = np.cos(i*t), np.sin(i*t)\n", " cofs[i] = (np.dot(v,c)*f, np.dot(v,s)*f)\n", " return (cofs)\n", "\n", "def fourier_synth(cofs, npts):\n", " nt = len(cofs)\n", " syn = np.zeros(npts)\n", " t = np.linspace(0, npts, npts, endpoint=False)*2*np.pi/npts\n", " for n in range(1, nt):\n", " for j in range(npts):\n", " syn[j] += cofs[n][0]*np.cos(n*t[j]) + cofs[n][1]*np.sin(n*t[j])\n", " return (syn)\n", "\n", "# Define K-index conversion table for ESK\n", "K_conversions = {\n", " f\"K{level}\": level_bin \n", " for level, level_bin in enumerate(\n", " (0, 8, 15, 30, 60, 105, 180, 300, 500, 750)\n", " )\n", "}\n", "# Define reverse mapping\n", "nT_to_K = {v: k for k, v in K_conversions.items()}\n", "\n", "def K_calc(d, synd, Kb=K_conversions):\n", " tmp = np.ptp((d-synd).reshape(8,180), axis=1)\n", " return(list(np.digitize(tmp, bins=list(Kb.values()), right=False)-1))\n", "\n", "def load_official_K(filepath=esk_k_ind_file):\n", " df = pd.read_csv(filepath, skiprows=0, header=None, delim_whitespace=True, \n", " parse_dates=[[2,1,0]], index_col=0)\n", " df = df.drop(3, axis=1)\n", " df.index.name='Date'\n", " df.columns = ['00','03','06','09','12','15','18','21']\n", " return(df)\n", "\n", "def load_ESK_2003():\n", " request = SwarmRequest()\n", " request.set_collection(f\"SW_OPER_AUX_OBSM2_:ESK\", verbose=False)\n", " request.set_products(measurements=[\"B_NEC\", \"IAGA_code\"])\n", " data = request.get_between(\n", " dt.datetime(2003, 1, 1),\n", " dt.datetime(2004, 1, 1),\n", " )\n", " df = data.as_dataframe(expand=True).drop(\n", " columns=[\"Spacecraft\"]\n", " )\n", " df = df.rename(columns={f\"B_NEC_{i}\": j for i, j in zip(\"NEC\", \"XYZ\")})\n", " return df" ] }, { "cell_type": "markdown", "id": "d494a5ce", "metadata": {}, "source": [ "First, load in (X, Y, Z) one-minute data from Eskdalemuir for 2003 into a pandas dataframe." ] }, { "cell_type": "code", "execution_count": null, "id": "5aab5cbc", "metadata": {}, "outputs": [], "source": [ "df_obs = load_ESK_2003()\n", "df_obs.head()" ] }, { "cell_type": "markdown", "id": "3b89a4c5", "metadata": {}, "source": [ "Load the official K index data (available from <http://www.geomag.bgs.ac.uk/data_service/data/magnetic_indices/k_indices>) to compare with later." ] }, { "cell_type": "code", "execution_count": null, "id": "fc7f2a86", "metadata": {}, "outputs": [], "source": [ "df_K_official = load_official_K()\n", "df_K_official.head()" ] }, { "cell_type": "markdown", "id": "bfbdd63a", "metadata": {}, "source": [ "Evaluate K indices for a given day:\n", "- For each of $X$ and $Y$:\n", "- Perform a Fourier analysis on the data to find the regular daily variation, $S_R$\n", "- Over each 3-hour interval, find the maximum differences from $S_R$\n", "- Convert from nT to $K$ using the conversion table for ESK\n", "- Pick the greater of $K(X)$ and $K(Y)$ and compare with the official K index" ] }, { "cell_type": "code", "execution_count": null, "id": "24586edc", "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "def analyse_day(day=dt.date(2003, 1, 1), n_harmonics=3, df=df_obs, df_K_official=df_K_official):\n", " \"\"\"Generate figure illustrating the K index calculation for a given day\"\"\"\n", " # Select given day\n", " _df = df.loc[day.isoformat()]\n", " _df_K = df_K_official.loc[day.isoformat()]\n", " # Select X & Y data and remove daily mean\n", " x = (_df[\"X\"] - _df[\"X\"].mean()).values\n", " y = (_df[\"Y\"] - _df[\"Y\"].mean()).values\n", " # Perform Fourier analysis of X & Y separately\n", " xcofs = fourier(x, n_harmonics)\n", " synx = fourier_synth(xcofs, len(x))\n", " ycofs = fourier(y, n_harmonics)\n", " syny = fourier_synth(ycofs, len(y))\n", "\n", " # Build plot\n", " t = np.linspace(0, 1440, 1440, endpoint=False)/60\n", " fig, axes = plt.subplots(2, 1, figsize=(15, 10), sharex=True)\n", " # Plot X & Y data with approximated variation\n", " axes[0].plot(t, x, color=\"tab:blue\", alpha=0.5)\n", " axes[0].plot(t, synx, color=\"tab:blue\", label=\"X\")\n", " axes[0].plot(t, y, color=\"tab:red\", alpha=0.5)\n", " axes[0].plot(t, syny, color=\"tab:red\", label=\"Y\")\n", " # Plot the differences\n", " axes[1].plot(t, (x-synx), color=\"tab:blue\")\n", " axes[1].plot(t, (y-syny), color=\"tab:red\")\n", "\n", " # Find and plot min/max bounds over 3-hourly intervals\n", " minX = np.min((x-synx).reshape(8, 180), axis=1)\n", " maxX = np.max((x-synx).reshape(8, 180), axis=1)\n", " minY = np.min((y-syny).reshape(8, 180), axis=1)\n", " maxY = np.max((y-syny).reshape(8, 180), axis=1)\n", " t_3hours = np.linspace(0, 1440, 9, endpoint=True)/60\n", " axes[1].fill_between(t_3hours, list(minX)+[0], list(maxX)+[0], step=\"post\", color=\"tab:blue\", alpha=0.5)\n", " axes[1].fill_between(t_3hours, list(minY)+[0], list(maxY)+[0], step=\"post\", color=\"tab:red\", alpha=0.5)\n", " \n", " # Determine K index from each of X & Y\n", " K_X = np.digitize((maxX-minX), bins=list(K_conversions.values()), right=False) - 1\n", " K_Y = np.digitize((maxY-minY), bins=list(K_conversions.values()), right=False) - 1\n", " for i, (K_X_i, K_Y_i) in enumerate(zip(K_X, K_Y)):\n", " # Display determined K from X & Y\n", " px = i*3\n", " py = axes[1].get_ylim()[1]\n", " axes[1].annotate(\n", " f\"K(X): {K_X_i}\", (px, py), xytext=(30, 18),\n", " textcoords=\"offset pixels\", color=\"tab:blue\", size=12,\n", " )\n", " axes[1].annotate(\n", " f\"K(Y): {K_Y_i}\", (px, py), xytext=(30, 3),\n", " textcoords=\"offset pixels\", color=\"tab:red\", size=12,\n", " )\n", " # Display comparison with the official K index\n", " K_ours = max(K_X_i, K_Y_i)\n", " K_official = _df_K[i]\n", " axes[1].annotate(\n", " f\"{K_ours}\\n{K_official}\",\n", " (i*3, axes[1].get_ylim()[0]), xytext=(40, -70), textcoords=\"offset pixels\"\n", " )\n", " axes[1].annotate(\n", " f\"Determined K:\\nOfficial K:\",\n", " (0, axes[1].get_ylim()[0]), xytext=(-80, -70), textcoords=\"offset pixels\"\n", " )\n", "\n", " # Finalise figure\n", " for ax in axes:\n", " ax.grid()\n", " ax.xaxis.set_ticks(np.arange(0, 27, 3))\n", " axes[1].set_ylabel(\"Residuals [nT]\")\n", " axes[1].set_xlabel(\"UT [hour]\")\n", " axes[0].set_ylabel(\"[nT]\")\n", " axes[0].legend(loc=\"upper right\")\n", " fig.suptitle(f\"ESK: {day.isoformat()}\", y=0.9)\n", "\n", " return fig, axes\n", "\n", "def make_widgets_K_index_calc():\n", " day = widgets.SelectionSlider(\n", " options=[t.date() for t in pd.date_range(dt.date(2003, 1, 1), dt.date(2003, 12, 31))],\n", " description=\"Select day:\", layout=widgets.Layout(width='700px')\n", " )\n", "# day = widgets.DatePicker(value=dt.date(2003, 1, 1), description=\"Select day:\")\n", " n_harmonics = widgets.SelectionSlider(options=range(1, 11), value=3, description=\"# harmonics:\")\n", " return widgets.VBox(\n", " [day,\n", " n_harmonics,\n", " widgets.interactive_output(\n", " analyse_day,\n", " {\"day\": day, \"n_harmonics\": n_harmonics}\n", " )],\n", " )\n", "\n", "make_widgets_K_index_calc()" ] }, { "cell_type": "markdown", "id": "7b5d6400", "metadata": {}, "source": [ "## Statistics of the K index" ] }, { "cell_type": "markdown", "id": "db1b4ffc", "metadata": {}, "source": [ "We will use the official K index from ESK to probe some statistics through the year 2003.\n", "\n", "Histograms of the K indices for each 3-hour period:" ] }, { "cell_type": "code", "execution_count": null, "id": "11fe4d32", "metadata": {}, "outputs": [], "source": [ "axes = df_K_official.hist(\n", " figsize=(12, 12), bins=range(11), sharey=True, align=\"left\", rwidth=0.8,\n", ")\n", "plt.suptitle('ESK 2003: Distribution of K-indices for each 3-hour interval')\n", "axes[-1, 0].set_ylabel(\"Frequency\")\n", "axes[-1, 0].set_xlabel(\"K\");" ] }, { "cell_type": "markdown", "id": "9296e6cc", "metadata": {}, "source": [ "... plotted side by side:" ] }, { "cell_type": "code", "execution_count": null, "id": "2363f5ef", "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(7,7))\n", "plt.hist(df_K_official.values, bins=range(11), align='left')\n", "plt.legend(df_K_official.columns)\n", "plt.ylabel('Number of 3-hour intervals')\n", "plt.xlabel('K');" ] }, { "cell_type": "markdown", "id": "51ceda99", "metadata": {}, "source": [ "... and stacked together:" ] }, { "cell_type": "code", "execution_count": null, "id": "e232c08e", "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(7,7))\n", "plt.hist(df_K_official.values, bins=range(11), stacked=True, align='left', rwidth=0.8)\n", "plt.legend(df_K_official.columns)\n", "plt.ylabel('Number of 3-hour intervals')\n", "plt.xlabel('K');" ] }, { "cell_type": "markdown", "id": "6c7aa148", "metadata": {}, "source": [ "We also compute a daily sum of the K-indices for the 2003 file, and list days with high and low summed values. Note that this summation is not really appropriate because the K-index is quasi-logarithmic, however, this is a common simple measure of quiet and disturbed days. (These might be interesting days for you to look at.)" ] }, { "cell_type": "code", "execution_count": null, "id": "365757c4", "metadata": {}, "outputs": [], "source": [ "df_K_official['Ksum'] = df_K_official.sum(axis=1)\n", "Ksort = df_K_official.sort_values('Ksum')\n", "print('Quiet days: \\n\\n', Ksort.head(10), '\\n\\n')\n", "print('Disturbed days: \\n\\n', Ksort.tail(10))" ] }, { "cell_type": "markdown", "id": "a58f982f", "metadata": {}, "source": [ "## Note on the Fast Fourier Transform" ] }, { "cell_type": "markdown", "id": "d6b4f9ce", "metadata": {}, "source": [ "In the examples above we computed Fourier coefficients in the 'traditional' way, so that if $F(t)$ is a Fourier series representation of $f(t)$, then,\n", "\n", "$$\n", "\\begin{align}\n", "F(t) &= A_o+\\sum_{n=1}^N A_n \\cos\\left(\\frac{2\\pi nt}{T}\\right)+B_n \\sin\\left(\\frac{2\\pi nt}{T}\\right)\n", "\\end{align}\n", "$$\n", "\n", "where $T$ is the fundamental period of $F(t)$. The $A_n$ and $B_n$ are estimated by\n", "\n", "$$\n", "\\begin{align}\n", "A_o&=\\frac{1}{T}\\int_0^T f(t) dt\\\\\n", "A_n&=\\frac{2}{T}\\int_0^T f(t)\\cos\\left(\\frac{2\\pi nt}{T}\\right) dt\\\\\n", "B_n&=\\frac{2}{T}\\int_0^T f(t)\\sin\\left(\\frac{2\\pi nt}{T}\\right) dt\n", "\\end{align}\n", "$$\n", "\n", "With $N$ samples of digital data, the integral for $A_n$ may be replaced by the summation\n", "\n", "$$\n", "\\begin{align}\n", "A_n&=\\frac{2}{T}\\sum_{j=0}^{N-1} f_j\\cos\\left(\\frac{2\\pi nj\\Delta t}{T}\\right) \\Delta t\\\\\n", "&=\\frac{2}{N}\\sum_{j=0}^{N-1} f_j\\cos\\left(\\frac{2\\pi nj}{N}\\right)\n", "\\end{align}\n", "$$\n", "\n", "where the sampling interval $\\Delta t$ is given by $T = N \\Delta t$ and $f_j = f(j \\Delta t)$. A similar expression applies for the $B_n$, and these are the coefficients returned by the function _fourier_ above.\n", "\n", "The fast Fourier transform (FFT) offers a computationally efficient means of finding the Fourier coefficients. The conventions for the FFT and its inverse (IFFT) vary from package to package. In the _scipy.fftpack_ package, the FFT of a sequence $x_n$ of length $N$ is defined as\n", "\n", "$$\n", "\\begin{align}\n", "y_k&=\\sum_{n=0}^{N-1} x_n\\exp\\left(-\\frac{2\\pi i\\thinspace kn}{N}\\right)\\\\\n", "&=\\sum_{n=0}^{N-1} x_n\\left(\\cos\\left(\\frac{2\\pi \\thinspace kn}{N}\\right)-i\\sin\\left(\\frac{2\\pi \\thinspace kn}{N}\\right)\\right)\n", "\\end{align}\n", "$$\n", "\n", "with the inverse defined as,\n", "\n", "$$\n", "\\begin{align}\n", "x_n&=\\frac{1}{N}\\sum_{k=0}^{N-1} y_k\\exp\\left(\\frac{2\\pi i\\thinspace kn}{N}\\right)\\\\\n", "\\end{align}\n", "$$\n", "\n", "(The _scipy_ documentation is perhaps a little confusing here because it explains the order of the $y_n$ as being $y_1,y_2, \\dots y_{N/2-1}$ as corresponding to increasing positive frequency and $y_{N/2}, y_{N/2+1}, \\dots y_{N-1}$ as ordered by decreasing negative frequency, for $N$ even. See: https://docs.scipy.org/doc/scipy/reference/tutorial/fft.html.)\n", "\n", "The interpretation is that if $y_k=a_k+ib_k$ then will have (for $N$ even), $y_{N-k} = a_k-ib_k$ and so\n", "\n", "$$\n", "\\begin{align}\n", "a_k&=\\frac{1}{2}\\text{Re}\\left(y_k+y_{N-k}\\right)\\\\\n", "b_k&=\\frac{1}{2}\\text{Im}\\left(y_k-y_{N-k}\\right)\n", "\\end{align}\n", "$$\n", "\n", "\n", "and so we expect the relationship to the digitised Fourier series coefficients returned by the function _fourier_ defined above to be,\n", "\n", "$$\n", "\\begin{align}\n", "A_k&=\\phantom{-}\\frac{1}{N}\\text{Re}\\left(a_k+a_{N-k}\\right)\\\\\n", "B_k&=-\\frac{1}{N}\\text{Im}\\left(b_k-b_{N-k}\\right)\n", "\\end{align}\n", "$$\n", "\n", "The following shows the equivalence between the conventional Fourier series approach and the FFT." ] }, { "cell_type": "code", "execution_count": null, "id": "4ee6e2a1", "metadata": {}, "outputs": [], "source": [ "from scipy.fftpack import fft\n", "\n", "# Compute the fourier series as before\n", "_df = df_obs.loc[\"2003-01-01\"]\n", "x = (_df[\"X\"] - _df[\"X\"].mean()).values\n", "xcofs = fourier(x, 3)\n", "# Compute using scipy FFT\n", "npts = len(x)\n", "xfft = fft(x)\n", "\n", "# Compare results for the 24-hour component\n", "k = 1\n", "print('Fourier coefficients: \\n', f'A1 = {xcofs[1][0]} \\n', f'B1 = {xcofs[1][1]} \\n')\n", "print('scipy FFT outputs: \\n', f'a1 = {np.real(xfft[k]+xfft[npts-k])/npts} \\n', \\\n", " f'b1 = {-np.imag(xfft[k]-xfft[npts-k])/npts} \\n')" ] }, { "cell_type": "markdown", "id": "c1c0f023", "metadata": {}, "source": [ "## References" ] }, { "cell_type": "markdown", "id": "92c986a8", "metadata": {}, "source": [ "Menvielle, M. et al. (1995) ‘Computer production of K indices: review and comparison of methods’, Geophysical Journal International. Oxford University Press, 123(3), pp. 866–886. doi: [10.1111/j.1365-246X.1995.tb06895.x](https://doi.org/10.1111/j.1365-246X.1995.tb06895.x)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }