09: Cells and cell methods#
Sometimes a data value does not represent a point, but instead some coordinate range that we can call a cell. Some examples could be
Maximum temperatures within each month
Values in latitude/longitude bins
Mean sea ice salinity from melting a chunk of an ice core between two depth values
In this tutorial we will look at some examples of how you can encode the limits of each cell in CF-NetCDF.
Maximum monthly temperatures#
Let’s start with the basic code to create our xarray object for a time series of temperature values.
Units of months since date or years since date are not recommended in CF because of potential for confusion with leap years. Therefore, we can use days since 1970-01-01 (the epoch) for first day of each month for 2023. One might use the start of each month, for example:
import datetime
import xarray as xr
import calendar
import numpy as np
start_days_since_epoch = []
year = 2023
epoch = datetime.date(1970, 1, 1)
for month in range(1, 13):
first_day_of_month = datetime.date(year, month, 1)
time_difference = (first_day_of_month - epoch).days
start_days_since_epoch.append(time_difference)
print(start_days_since_epoch)
[19358, 19389, 19417, 19448, 19478, 19509, 19539, 19570, 19601, 19631, 19662, 19692]
maximum_monthly_temperatures = [4.6,5.2,7.1,12.3,17.8,21.3,24.6,22.8,19.0,14.2,8.8,6.1]
xrds = xr.Dataset(
coords={
'time': start_days_since_epoch
},
data_vars={
'Temperature': ('time', maximum_monthly_temperatures)
}
)
xrds['Temperature'].attrs = {
'standard_name': 'air_temperature',
'long_name': 'Maximum air temperatures per month',
'units': 'Degrees Celsius',
'coverage_content_type': 'physicalMeasurement'
}
xrds['time'].attrs = {
'standard_name': 'time',
'long_name': 'time in months',
'units': 'days since 1970-01-01',
'coverage_content_type': 'coordinate'
}
xrds
<xarray.Dataset> Size: 192B Dimensions: (time: 12) Coordinates: * time (time) int64 96B 19358 19389 19417 19448 ... 19631 19662 19692 Data variables: Temperature (time) float64 96B 4.6 5.2 7.1 12.3 17.8 ... 19.0 14.2 8.8 6.1
But our values are not representative of just one day; they are maximum values for the month! How can we let the user know is in an unambiguous, machine-readable way? Remember - the long name is free text, so we can’t rely on this.
We need to include some bounds to define where each month begins and end. This should be from midnight on 1st of the month until midnight on the 1st of the following month.
year = 2023
epoch = datetime.date(1970, 1, 1)
# List to store the start and end times of each cell
time_bounds = []
for month in range(1, 13):
# Start of the current month
first_day_of_month = datetime.date(year, month, 1)
start_days_since_epoch = (first_day_of_month - epoch).days
# Start of the next month
if month == 12: # Handle December
next_month = datetime.date(year + 1, 1, 1)
else:
next_month = datetime.date(year, month + 1, 1)
end_days_since_epoch = (next_month - epoch).days
# Add to bounds
time_bounds.append((start_days_since_epoch, end_days_since_epoch))
print(time_bounds)
[(19358, 19389), (19389, 19417), (19417, 19448), (19448, 19478), (19478, 19509), (19509, 19539), (19539, 19570), (19570, 19601), (19601, 19631), (19631, 19662), (19662, 19692), (19692, 19723)]
Now we need to create a new variable in our xarray object for the time bounds. This variable needs to have 2 dimensions. time of course, but also another dimension that we will call nv for number of vertices. This dimension will have a value of 2 in this case because the cell provides information on only the minimum and maximum time.
xrds = xrds.expand_dims(nv=2) # Creating new dimension
xrds['time_bounds'] = (['time','nv'], time_bounds)
xrds
<xarray.Dataset> Size: 480B Dimensions: (nv: 2, time: 12) Coordinates: * time (time) int64 96B 19358 19389 19417 19448 ... 19631 19662 19692 Dimensions without coordinates: nv Data variables: Temperature (nv, time) float64 192B 4.6 5.2 7.1 12.3 ... 19.0 14.2 8.8 6.1 time_bounds (time, nv) int64 192B 19358 19389 19389 ... 19692 19692 19723
Now we need to add metadata to make this machine readable. Below we are saying that the time_bounds variable defines the bounds of the time variable. We are using cell_methods to state that values are the maximums within each cell with respect to time.
xrds['time'].attrs['bounds'] = 'time_bounds'
xrds['Temperature'].attrs['cell_methods'] = 'time: maximum'
A full list of possible cell methods that you can use is provided here: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#appendix-cell-methods
This includes maximum
, minimum
, mean
, median
,standard_deviation
and more.
Ice core data#
Here is a full example for ice core data. You might have a pandas dataframe something like what we are creating below.
import pandas as pd
# Sample data
data = {
'minimum_depth': [0, 10, 20, 30],
'maximum_depth': [10, 20, 30, 38],
'salinity': [35.4, 35.6, 35.5, 35.2]
}
# Creating the DataFrame
df = pd.DataFrame(data)
df
minimum_depth | maximum_depth | salinity | |
---|---|---|---|
0 | 0 | 10 | 35.4 |
1 | 10 | 20 | 35.6 |
2 | 20 | 30 | 35.5 |
3 | 30 | 38 | 35.2 |
Here is the code to get this into an xarray object with bounds
xrds = xr.Dataset(
coords={
'depth': df['minimum_depth'] # Could be any value within the cell
},
data_vars={
'Salinity': ('depth', df['salinity'])
}
)
xrds = xrds.expand_dims(nv=2)
depths_2d_array = df[['minimum_depth','maximum_depth']].to_numpy()
xrds['depth_bounds'] = (['depth','nv'], depths_2d_array)
xrds['Salinity'].attrs = {
'standard_name': 'sea_ice_salinity',
'long_name': 'Salinity of sea ice measured by melting chunks of an ice core',
'units': '1e-3',
'coverage_content_type': 'physicalMeasurement',
'cell_methods': 'depth: mean'
}
xrds['depth'].attrs = {
'standard_name': 'depth',
'long_name': 'depth in ice core',
'units': 'cm',
'positive': 'down',
'coverage_content_type': 'coordinate',
'bounds': 'depth_bounds'
}
xrds
<xarray.Dataset> Size: 160B Dimensions: (nv: 2, depth: 4) Coordinates: * depth (depth) int64 32B 0 10 20 30 Dimensions without coordinates: nv Data variables: Salinity (nv, depth) float64 64B 35.4 35.6 35.5 35.2 ... 35.6 35.5 35.2 depth_bounds (depth, nv) int64 64B 0 10 10 20 20 30 30 38
How to cite this course#
If you think this course contributed to the work you are doing, consider citing it in your list of references. Here is a recommended citation:
Marsden, L. (2024, April 19). NetCDF in Python - from beginner to pro. Zenodo. https://doi.org/10.5281/zenodo.10997447
And you can navigate to the publication and export the citation in different styles and formats by clicking the icon below.