08: Ancillary variables#
In science you often don’t just want to publish your data variables. You might want to include extra or secondary variables that are related or provide further context to your primary data variables.
For example, you might have sea water chlorophyll A data taken from water samples at different depths. You might want to also publish
the volume of your water sample
other values you have measured in order to compute the chlorophyll A values
quality flags
In the CF conventions, these variables are referred to as ancillary data, and this section of the CF conventions is dedicated to them: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#ancillary-data
In this tutorial, we will look at how you can include ancillary data in a CF-NetCDF file and encode how the variables relate to each other in a machine-understandable way.
Basic example without ancillary data#
import xarray as xr
depths = [10,20,30,40,50]
chlorophyll_a = [0.411,0.152,0.067,0.017,0.014]
xrds = xr.Dataset(
coords={
'depth': depths
},
data_vars={
'Chlorophyll_A': ('depth', chlorophyll_a)
}
)
xrds['Chlorophyll_A'].attrs = {
'standard_name': 'mass_concentration_of_chlorophyll_a_in_sea_water',
'long_name': 'Mass concentration of chlorophyll a in sea water derived from water samples from Niskin bottles',
'units': 'μg L-1',
'coverage_content_type': 'physicalMeasurement'
}
xrds['depth'].attrs = {
'standard_name': 'depth',
'long_name': 'Sea water depth',
'units': 'meters',
'coverage_content_type': 'coordinate',
'positive': 'down'
}
xrds
<xarray.Dataset> Dimensions: (depth: 5) Coordinates: * depth (depth) int64 10 20 30 40 50 Data variables: Chlorophyll_A (depth) float64 0.411 0.152 0.067 0.017 0.014
Assigning quality or status flags#
Quality or status flags tell the user about the quality information of the data. You can read about how to encode this section of the CF conventions (see examples 3.4 to 3.8): https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#ancillary-data
We need to create a new variable for the flags.
Flags are stored as numbers, and with the meanings for the numbers stored as a variable attribute. The flag_meanings should be separated by spaces - so don’t include spaces in any of the terms you use! The length of the flag values and flag_meanings should be the same.
chla_flag_possible_values = [0,1,2,3,4,5,6,7,8,9]
chla_flag_meanings = "no_qc_performed good_data probably_good_data bad_data_that_are_potentially_correctable bad_data value_changed value_below_detection nominal_value interpolated_value missing_value"
So for example, a value of 2 means probably_good_data.
You might wonder which conventions these quality flag values and meanings adhere to. In this case, we are following the OceanSITES Manual v 1.4. http://www.oceansites.org/docs/oceansites_data_format_reference_manual.pdf
However, other quality flag conventions exist.
Now let’s create a variable for the quality flags.
chla_flags = [1,1,1,2,1] # Same length as Chlorophyll_A variable
xrds['Chlorophyll_A_quality_flags'] = ('depth', chla_flags)
xrds
<xarray.Dataset> Dimensions: (depth: 5) Coordinates: * depth (depth) int64 10 20 30 40 50 Data variables: Chlorophyll_A (depth) float64 0.411 0.152 0.067 0.017 0.014 Chlorophyll_A_quality_flags (depth) int64 1 1 1 2 1
Now we need to state that the new Chlorophyll_A_quality_flags variable is related to the Chlorophyll A variable.
xrds['Chlorophyll_A'].attrs['ancillary_variables'] = "Chlorophyll_A_quality_flags"
Finally we need to add our metadata to the ancillary variable to describe it. There are a lot of standard names for different types of flags. Search for flag here to find a suitable standard_name for you. https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html
The CF conventions also has standardised variable attributes you can use for flag_values and flag_meanings. You often see the valid_range attribute used here too to explicitely show that any values outside of that range are invalid. You could use valid_min and valid_max used instead.
# Metadata for the 'Chlorophyll_A_quality_flags' variable
xrds['Chlorophyll_A_quality_flags'].attrs = {
'long_name': 'Chlorophyll A quality flag',
'standard_name': 'quality_flag',
'flag_values': chla_flag_possible_values,
'flag_meanings': chla_flag_meanings,
'valid_range': [0,9],
'coverage_content_type': 'qualityInformation',
'_FillValue': -127
}
xrds['Chlorophyll_A_quality_flags']
<xarray.DataArray 'Chlorophyll_A_quality_flags' (depth: 5)> array([1, 1, 1, 2, 1]) Coordinates: * depth (depth) int64 10 20 30 40 50 Attributes: long_name: Chlorophyll A quality flag standard_name: quality_flag flag_values: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] flag_meanings: no_qc_performed good_data probably_good_data bad_... valid_range: [0, 9] coverage_content_type: qualityInformation _FillValue: -127
Make sure you refer to the conventions you are following for your quality flags in your Conventions global attribute, for example
xrds.attrs['Conventions'] = 'CF-1.8, ACDD-1.3, OceanSITES Manual 1.4'
xrds
<xarray.Dataset> Dimensions: (depth: 5) Coordinates: * depth (depth) int64 10 20 30 40 50 Data variables: Chlorophyll_A (depth) float64 0.411 0.152 0.067 0.017 0.014 Chlorophyll_A_quality_flags (depth) int64 1 1 1 2 1 Attributes: Conventions: CF-1.8, ACDD-1.3, OceanSITES Manual 1.4
Retrieving only good_quality data#
Suppose we want to retrieve only the good_quality Chlorophyll_A data, where Chlorophyll_A_quality_flags = 1
good_quality_chlorophyll_a = xrds['Chlorophyll_A'].where(xrds['Chlorophyll_A_quality_flags'] == 1)
good_quality_chlorophyll_a
<xarray.DataArray 'Chlorophyll_A' (depth: 5)> array([0.411, 0.152, 0.067, nan, 0.014]) Coordinates: * depth (depth) int64 10 20 30 40 50 Attributes: standard_name: mass_concentration_of_chlorophyll_a_in_sea_water long_name: Mass concentration of chlorophyll a in sea water ... units: μg L-1 coverage_content_type: physicalMeasurement ancillary_variables: Chlorophyll_A_quality_flags
And to drop the nans
good_quality_chlorophyll_a = xrds['Chlorophyll_A'].where(xrds['Chlorophyll_A_quality_flags'] == 1, drop=True)
good_quality_chlorophyll_a
<xarray.DataArray 'Chlorophyll_A' (depth: 4)> array([0.411, 0.152, 0.067, 0.014]) Coordinates: * depth (depth) int64 10 20 30 50 Attributes: standard_name: mass_concentration_of_chlorophyll_a_in_sea_water long_name: Mass concentration of chlorophyll a in sea water ... units: μg L-1 coverage_content_type: physicalMeasurement ancillary_variables: Chlorophyll_A_quality_flags
Other ancillary data#
We can write other ancillary variables in a similar way. For example.
filtered_volumes = [0.8,1.2,0.7,0.8,1.0]
xrds['Filtered_volume'] = ('depth', filtered_volumes)
# Multiple ancillary variables separated by spaces
xrds['Chlorophyll_A'].attrs['ancillary_variables'] = "Chlorophyll_A_quality_flags Filtered_volume"
xrds['Filtered_volume'].attrs = {
'long_name': 'Volume of sea water filtered to to measure the Chlorophyll A values',
'units': 'L',
'covereage_content_type': 'auxiliaryInformation',
'_FillValue': -1
}
xrds
<xarray.Dataset> Dimensions: (depth: 5) Coordinates: * depth (depth) int64 10 20 30 40 50 Data variables: Chlorophyll_A (depth) float64 0.411 0.152 0.067 0.017 0.014 Chlorophyll_A_quality_flags (depth) int64 1 1 1 2 1 Filtered_volume (depth) float64 0.8 1.2 0.7 0.8 1.0 Attributes: Conventions: CF-1.8, ACDD-1.3, OceanSITES Manual 1.4
More work needs to be done to expand the CF conventions to standardise ancillary data. At the time of writing, a standard_name for the volume of sea water filtered does not exist.
This is where the scientific community can help!
New standard names can be suggested by raising an issue of this GitHub repository: cf-convention/discuss#issues
Follow these guidelines for constructing standard names: https://cfconventions.org/Data/cf-standard-names/docs/guidelines.html
How to cite this course#
If you think this course contributed to the work you are doing, consider citing it in your list of references. Here is a recommended citation:
Marsden, L. (2024, April 19). NetCDF in Python - from beginner to pro. Zenodo. https://doi.org/10.5281/zenodo.10997447
And you can navigate to the publication and export the citation in different styles and formats by clicking the icon below.