{ "cells": [ { "cell_type": "markdown", "id": "ff9b6fac", "metadata": {}, "source": [ "# 08: Ancillary variables\n", "\n", "In science you often don't just want to publish your data variables. You might want to include extra or secondary variables that are related or provide further context to your primary data variables.\n", "\n", "For example, you might have sea water chlorophyll A data taken from water samples at different depths. You might want to also publish \n", "* the volume of your water sample\n", "* other values you have measured in order to compute the chlorophyll A values\n", "* quality flags\n", "\n", "In the CF conventions, these variables are referred to as ancillary data, and this section of the CF conventions is dedicated to them:\n", "https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#ancillary-data\n", "\n", "In this tutorial, we will look at how you can include ancillary data in a CF-NetCDF file and encode how the variables relate to each other in a machine-understandable way.\n", "\n", "## Basic example without ancillary data" ] }, { "cell_type": "code", "execution_count": 3, "id": "a7306929", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset>\n",
       "Dimensions:        (depth: 5)\n",
       "Coordinates:\n",
       "  * depth          (depth) int64 10 20 30 40 50\n",
       "Data variables:\n",
       "    Chlorophyll_A  (depth) float64 0.411 0.152 0.067 0.017 0.014
" ], "text/plain": [ "\n", "Dimensions: (depth: 5)\n", "Coordinates:\n", " * depth (depth) int64 10 20 30 40 50\n", "Data variables:\n", " Chlorophyll_A (depth) float64 0.411 0.152 0.067 0.017 0.014" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import xarray as xr\n", "\n", "depths = [10,20,30,40,50]\n", "chlorophyll_a = [0.411,0.152,0.067,0.017,0.014]\n", "\n", "xrds = xr.Dataset(\n", " coords={\n", " 'depth': depths\n", " },\n", " data_vars={\n", " 'Chlorophyll_A': ('depth', chlorophyll_a)\n", " } \n", ")\n", "\n", "xrds['Chlorophyll_A'].attrs = {\n", " 'standard_name': 'mass_concentration_of_chlorophyll_a_in_sea_water',\n", " 'long_name': 'Mass concentration of chlorophyll a in sea water derived from water samples from Niskin bottles',\n", " 'units': 'μg L-1',\n", " 'coverage_content_type': 'physicalMeasurement'\n", "}\n", "xrds['depth'].attrs = {\n", " 'standard_name': 'depth',\n", " 'long_name': 'Sea water depth',\n", " 'units': 'meters',\n", " 'coverage_content_type': 'coordinate',\n", " 'positive': 'down'\n", "}\n", "xrds" ] }, { "cell_type": "markdown", "id": "6ec53b6e", "metadata": {}, "source": [ "## Assigning quality or status flags\n", "\n", "Quality or status flags tell the user about the quality information of the data. You can read about how to encode this section of the CF conventions (see examples 3.4 to 3.8):\n", "https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#ancillary-data\n", "\n", "We need to create a new variable for the flags.\n", "\n", "Flags are stored as numbers, and with the meanings for the numbers stored as a variable attribute. The *flag_meanings* should be separated by spaces - so don't include spaces in any of the terms you use! The length of the *flag values* and *flag_meanings* should be the same. " ] }, { "cell_type": "code", "execution_count": 1, "id": "a942a457", "metadata": {}, "outputs": [], "source": [ "chla_flag_possible_values = [0,1,2,3,4,5,6,7,8,9]\n", "chla_flag_meanings = \"no_qc_performed good_data probably_good_data bad_data_that_are_potentially_correctable bad_data value_changed value_below_detection nominal_value interpolated_value missing_value\"" ] }, { "cell_type": "markdown", "id": "96a656ec", "metadata": {}, "source": [ "So for example, a value of 2 means *probably_good_data*.\n", "\n", "You might wonder which conventions these quality flag values and meanings adhere to. In this case, we are following the OceanSITES Manual v 1.4.\n", "http://www.oceansites.org/docs/oceansites_data_format_reference_manual.pdf\n", "\n", "However, other quality flag conventions exist.\n", "\n", "Now let's create a variable for the quality flags." ] }, { "cell_type": "code", "execution_count": 13, "id": "93511e5e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset>\n",
       "Dimensions:                      (depth: 5)\n",
       "Coordinates:\n",
       "  * depth                        (depth) int64 10 20 30 40 50\n",
       "Data variables:\n",
       "    Chlorophyll_A                (depth) float64 0.411 0.152 0.067 0.017 0.014\n",
       "    Chlorophyll_A_quality_flags  (depth) int64 1 1 1 2 1
" ], "text/plain": [ "\n", "Dimensions: (depth: 5)\n", "Coordinates:\n", " * depth (depth) int64 10 20 30 40 50\n", "Data variables:\n", " Chlorophyll_A (depth) float64 0.411 0.152 0.067 0.017 0.014\n", " Chlorophyll_A_quality_flags (depth) int64 1 1 1 2 1" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chla_flags = [1,1,1,2,1] # Same length as Chlorophyll_A variable\n", "\n", "xrds['Chlorophyll_A_quality_flags'] = ('depth', chla_flags)\n", "\n", "xrds" ] }, { "cell_type": "markdown", "id": "ad00e25c", "metadata": {}, "source": [ "Now we need to state that the new *Chlorophyll_A_quality_flags* variable is related to the *Chlorophyll A* variable." ] }, { "cell_type": "code", "execution_count": 10, "id": "afaa4d8d", "metadata": {}, "outputs": [], "source": [ "xrds['Chlorophyll_A'].attrs['ancillary_variables'] = \"Chlorophyll_A_quality_flags\"" ] }, { "cell_type": "markdown", "id": "3ba86208", "metadata": {}, "source": [ "Finally we need to add our metadata to the ancillary variable to describe it. There are a lot of standard names for different types of flags. Search for *flag* here to find a suitable *standard_name* for you. \n", "https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html\n", "\n", "The CF conventions also has standardised variable attributes you can use for *flag_values* and *flag_meanings*. You often see the *valid_range* attribute used here too to explicitely show that any values outside of that range are invalid. You could use *valid_min* and *valid_max* used instead." ] }, { "cell_type": "code", "execution_count": 14, "id": "bf1a9760", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'Chlorophyll_A_quality_flags' (depth: 5)>\n",
       "array([1, 1, 1, 2, 1])\n",
       "Coordinates:\n",
       "  * depth    (depth) int64 10 20 30 40 50\n",
       "Attributes:\n",
       "    long_name:      Chlorophyll A quality flag\n",
       "    standard_name:  quality_flag\n",
       "    flag_values:    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n",
       "    flag_meanings:  no_qc_performed good_data probably_good_data bad_data_tha...\n",
       "    valid_range:    [0, 9]\n",
       "    _FillValue:     -127
" ], "text/plain": [ "\n", "array([1, 1, 1, 2, 1])\n", "Coordinates:\n", " * depth (depth) int64 10 20 30 40 50\n", "Attributes:\n", " long_name: Chlorophyll A quality flag\n", " standard_name: quality_flag\n", " flag_values: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n", " flag_meanings: no_qc_performed good_data probably_good_data bad_data_tha...\n", " valid_range: [0, 9]\n", " _FillValue: -127" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Metadata for the 'Chlorophyll_A_quality_flags' variable\n", "xrds['Chlorophyll_A_quality_flags'].attrs = {\n", " 'long_name': 'Chlorophyll A quality flag',\n", " 'standard_name': 'quality_flag',\n", " 'flag_values': chla_flag_possible_values,\n", " 'flag_meanings': chla_flag_meanings,\n", " 'valid_range': [0,9],\n", " 'coverage_content_type': 'qualityInformation',\n", " '_FillValue': -127\n", "}\n", "\n", "xrds['Chlorophyll_A_quality_flags']\n" ] }, { "cell_type": "markdown", "id": "667e8976", "metadata": {}, "source": [ "Make sure you refer to the conventions you are following for your quality flags in your *Conventions* global attribute, for example" ] }, { "cell_type": "code", "execution_count": 16, "id": "c9f14c79", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset>\n",
       "Dimensions:                      (depth: 5)\n",
       "Coordinates:\n",
       "  * depth                        (depth) int64 10 20 30 40 50\n",
       "Data variables:\n",
       "    Chlorophyll_A                (depth) float64 0.411 0.152 0.067 0.017 0.014\n",
       "    Chlorophyll_A_quality_flags  (depth) int64 1 1 1 2 1\n",
       "Attributes:\n",
       "    Conventions:  CF-1.8, ACDD-1.3, OceanSITES Manual 1.4
" ], "text/plain": [ "\n", "Dimensions: (depth: 5)\n", "Coordinates:\n", " * depth (depth) int64 10 20 30 40 50\n", "Data variables:\n", " Chlorophyll_A (depth) float64 0.411 0.152 0.067 0.017 0.014\n", " Chlorophyll_A_quality_flags (depth) int64 1 1 1 2 1\n", "Attributes:\n", " Conventions: CF-1.8, ACDD-1.3, OceanSITES Manual 1.4" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xrds.attrs['Conventions'] = 'CF-1.8, ACDD-1.3, OceanSITES Manual 1.4'\n", "xrds" ] }, { "cell_type": "markdown", "id": "278d806e", "metadata": {}, "source": [ "## Retrieving only good_quality data\n", "Suppose we want to retrieve only the good_quality *Chlorophyll_A* data, where *Chlorophyll_A_quality_flags = 1*" ] }, { "cell_type": "code", "execution_count": 19, "id": "0aeace83", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'Chlorophyll_A' (depth: 5)>\n",
       "array([0.411, 0.152, 0.067,   nan, 0.014])\n",
       "Coordinates:\n",
       "  * depth    (depth) int64 10 20 30 40 50\n",
       "Attributes:\n",
       "    standard_name:          mass_concentration_of_chlorophyll_a_in_sea_water\n",
       "    long_name:              Mass concentration of chlorophyll a in sea water ...\n",
       "    units:                  μg L-1\n",
       "    coverage_content_type:  physicalMeasurement\n",
       "    ancillary_variables:    Chlorophyll_A_quality_flags
" ], "text/plain": [ "\n", "array([0.411, 0.152, 0.067, nan, 0.014])\n", "Coordinates:\n", " * depth (depth) int64 10 20 30 40 50\n", "Attributes:\n", " standard_name: mass_concentration_of_chlorophyll_a_in_sea_water\n", " long_name: Mass concentration of chlorophyll a in sea water ...\n", " units: μg L-1\n", " coverage_content_type: physicalMeasurement\n", " ancillary_variables: Chlorophyll_A_quality_flags" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "good_quality_chlorophyll_a = xrds['Chlorophyll_A'].where(xrds['Chlorophyll_A_quality_flags'] == 1)\n", "good_quality_chlorophyll_a" ] }, { "cell_type": "markdown", "id": "f3f3de31", "metadata": {}, "source": [ "And to drop the nans" ] }, { "cell_type": "code", "execution_count": 20, "id": "ba921026", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'Chlorophyll_A' (depth: 4)>\n",
       "array([0.411, 0.152, 0.067, 0.014])\n",
       "Coordinates:\n",
       "  * depth    (depth) int64 10 20 30 50\n",
       "Attributes:\n",
       "    standard_name:          mass_concentration_of_chlorophyll_a_in_sea_water\n",
       "    long_name:              Mass concentration of chlorophyll a in sea water ...\n",
       "    units:                  μg L-1\n",
       "    coverage_content_type:  physicalMeasurement\n",
       "    ancillary_variables:    Chlorophyll_A_quality_flags
" ], "text/plain": [ "\n", "array([0.411, 0.152, 0.067, 0.014])\n", "Coordinates:\n", " * depth (depth) int64 10 20 30 50\n", "Attributes:\n", " standard_name: mass_concentration_of_chlorophyll_a_in_sea_water\n", " long_name: Mass concentration of chlorophyll a in sea water ...\n", " units: μg L-1\n", " coverage_content_type: physicalMeasurement\n", " ancillary_variables: Chlorophyll_A_quality_flags" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "good_quality_chlorophyll_a = xrds['Chlorophyll_A'].where(xrds['Chlorophyll_A_quality_flags'] == 1, drop=True)\n", "good_quality_chlorophyll_a" ] }, { "cell_type": "markdown", "id": "0802c13f", "metadata": {}, "source": [ "## Other ancillary data\n", "\n", "We can write other ancillary variables in a similar way. For example." ] }, { "cell_type": "code", "execution_count": 21, "id": "dd6f4c39", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset>\n",
       "Dimensions:                      (depth: 5)\n",
       "Coordinates:\n",
       "  * depth                        (depth) int64 10 20 30 40 50\n",
       "Data variables:\n",
       "    Chlorophyll_A                (depth) float64 0.411 0.152 0.067 0.017 0.014\n",
       "    Chlorophyll_A_quality_flags  (depth) int64 1 1 1 2 1\n",
       "    Filtered_volume              (depth) float64 0.8 1.2 0.7 0.8 1.0\n",
       "Attributes:\n",
       "    Conventions:  CF-1.8, ACDD-1.3, OceanSITES Manual 1.4
" ], "text/plain": [ "\n", "Dimensions: (depth: 5)\n", "Coordinates:\n", " * depth (depth) int64 10 20 30 40 50\n", "Data variables:\n", " Chlorophyll_A (depth) float64 0.411 0.152 0.067 0.017 0.014\n", " Chlorophyll_A_quality_flags (depth) int64 1 1 1 2 1\n", " Filtered_volume (depth) float64 0.8 1.2 0.7 0.8 1.0\n", "Attributes:\n", " Conventions: CF-1.8, ACDD-1.3, OceanSITES Manual 1.4" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "filtered_volumes = [0.8,1.2,0.7,0.8,1.0]\n", "xrds['Filtered_volume'] = ('depth', filtered_volumes)\n", "\n", "# Multiple ancillary variables separated by spaces\n", "xrds['Chlorophyll_A'].attrs['ancillary_variables'] = \"Chlorophyll_A_quality_flags Filtered_volume\" \n", "\n", "xrds['Filtered_volume'].attrs = {\n", " 'long_name': 'Volume of sea water filtered to to measure the Chlorophyll A values',\n", " 'units': 'L',\n", " 'covereage_content_type': 'auxiliaryInformation',\n", " '_FillValue': -1\n", "}\n", "\n", "xrds" ] }, { "cell_type": "markdown", "id": "8e994299", "metadata": {}, "source": [ "More work needs to be done to expand the CF conventions to standardise ancillary data. At the time of writing, a *standard_name* for the volume of sea water filtered does not exist. \n", "\n", "This is where the scientific community can help! \n", "\n", "New standard names can be suggested by raising an issue of this GitHub repository:\n", "https://github.com/cf-convention/discuss/issues\n", "\n", "Follow these guidelines for constructing standard names:\n", "https://cfconventions.org/Data/cf-standard-names/docs/guidelines.html\n", "\n", "## How to cite this course\n", "\n", "If you think this course contributed to the work you are doing, consider citing it in your list of references. Here is a recommended citation:\n", "\n", "Marsden, L. (2024, April 19). NetCDF in Python - from beginner to pro. Zenodo. https://doi.org/10.5281/zenodo.10997447\n", "\n", "And you can navigate to the publication and export the citation in different styles and formats by clicking the icon below.\n", "\n", "[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10997447.svg)](https://doi.org/10.5281/zenodo.10997447)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 5 }