04: Creating a CF-NetCDF file

04: Creating a CF-NetCDF file#

from IPython.display import YouTubeVideo
YouTubeVideo('a5QvdSffqrM')

In this session we will create a basic NetCDF file that compliant with the Attribute Convention for Data Discovery (ACDD) and Climate and Forecast (CF) convention.

Firstly, let’s import the modules that we will work with.

import xarray as xr 
from datetime import datetime as dt 
import numpy as np 
import pandas as pd 

Initialising your xarray dataset object#

The first step is to create an empty xarray dataset object.

xrds = xr.Dataset()
xrds

Right away we can see the object has a defined structure with dimensions and variables. A key feature of a NetCDF file is that there is a defined structure so your data and metadata will always be in the same place within the file. This makes it easier for a machine to read it. We will add more types of data and metadata to this object as we go.

Data Variables#

Now let’s add some data variables. Starting from the xarray dataset object created directly above that has multiple dimensions.

You can choose what name you assign for each variable. This is not standardised, but be sensible and clear. I will show you how to make your data variables conform to the CF conventions using variable attributes using metadata a bit further down in this chapter.

3D data from pandas dataframe to 3D grid#

Many people prefer to record their data first in a table, perhaps a CSV file or XLSX file. These data can be loaded into a pandas dataframe.

Is there a way to transform your tabular data into multidimensional arrays? I’ll create a dummy dataframe first

# Create lists to store the coordinates and salinity values
depth_coordinates = []
latitude_coordinates = []
longitude_coordinates = []
salinity_values = []

# Generate the coordinates and salinity values for the grid
for d in depth:
    for lat in latitude:
        for lon in longitude:
            depth_coordinates.append(d)
            latitude_coordinates.append(lat)
            longitude_coordinates.append(lon)
            salinity = np.random.uniform(30, 35)  # Random salinity value between 30 and 35
            salinity_values.append(salinity)

# Create a DataFrame
data = {
    'Depth': depth_coordinates,
    'Latitude': latitude_coordinates,
    'Longitude': longitude_coordinates,
    'Salinity': salinity_values
}

df = pd.DataFrame(data)
df

	Depth	Latitude	Longitude	Salinity
0	0	78.5425	30.0131	34.300921
1	0	78.5425	28.7269	30.584718
2	0	79.1423	30.0131	31.365844
3	0	79.1423	28.7269	32.058863
4	0	80.7139	30.0131	33.704389
5	0	80.7139	28.7269	33.820409
6	10	78.5425	30.0131	30.893971
7	10	78.5425	28.7269	32.632514
8	10	79.1423	30.0131	34.565236
9	10	79.1423	28.7269	31.219493
10	10	80.7139	30.0131	30.334035
11	10	80.7139	28.7269	33.577165
12	20	78.5425	30.0131	34.009353
13	20	78.5425	28.7269	30.150542
14	20	79.1423	30.0131	34.906550
15	20	79.1423	28.7269	34.524013
16	20	80.7139	30.0131	32.806575
17	20	80.7139	28.7269	34.438574
18	50	78.5425	30.0131	31.559159
19	50	78.5425	28.7269	30.096311
20	50	79.1423	30.0131	34.638809
21	50	79.1423	28.7269	33.817651
22	50	80.7139	30.0131	30.926406
23	50	80.7139	28.7269	33.552499
24	100	78.5425	30.0131	30.585861
25	100	78.5425	28.7269	34.444628
26	100	79.1423	30.0131	31.558820
27	100	79.1423	28.7269	34.459547
28	100	80.7139	30.0131	30.770158
29	100	80.7139	28.7269	31.280053

Now, let’s create a multidimensional grid for our salinity variable. We need to be a bit careful with the order here. The dataframe is sorted first by depth (5 depths), then by latitude (3 latitudes), then by longitude (2 longitudes). We should mirror that order.

salinity_3d_array = np.array(df['Salinity']).reshape(5,3,2)
salinity_3d_array

array([[[34.30092136, 30.58471819],
        [31.36584358, 32.05886312],
        [33.7043894 , 33.82040892]],

       [[30.89397148, 32.63251359],
        [34.56523607, 31.21949273],
        [30.33403543, 33.57716522]],

       [[34.00935268, 30.15054185],
        [34.90655047, 34.52401304],
        [32.80657516, 34.43857368]],

       [[31.5591591 , 30.09631052],
        [34.63880863, 33.81765118],
        [30.92640601, 33.5524991 ]],

       [[30.58586148, 34.44462793],
        [31.55881957, 34.4595471 ],
        [30.77015808, 31.28005345]]])

Exporting your xarray object to a NetCDF file#

Finally you need to export your data. Firstly, you can specify how each variable should be encoded. This is an optional step - it will be assumed from the data type in python if you don’t specify the encoding manually.

Fill values: The fill value will be used to fill in any missing values. It should be an unrealistic value that will obviously show up as a spike in the data when plotted. The _FillValue is a special variable attribute that some softwares can understand, so when one opens the data, the fill values are replaced by NaNs again.
dtype: What type of data does your variable contain? Characters? Integers? Decimal numbers? Some commonly used dtype values are:
- float32 / float64: Ideal for storing decimal numbers with single (32-bit) or double (64-bit) precision.
- int32 / int64: Integers of different sizes. int32 is suitable in most cases. int64 is appropriate for very large integers or when precision is crucial.
- S1 / S10 / S100: Useful for storing string data in arrays where strings have a consistent and known maximum length. S10 defines a string length of 10, for example.

# Specifiy encoding - you can write a file without this and encoding will be assumed, but you should check in any case.
myencoding = {
    'depth': {
        'dtype': 'int32',
        '_FillValue': None # Coordinate variables should not have fill values.
        },
    'latitude': {
        'dtype': 'float32',
        '_FillValue': None # Coordinate variables should not have fill values.
        },
    'longitude': {
        'dtype': 'float32',
        '_FillValue': None # Coordinate variables should not have fill values.
        },
    'chlorophyll_a': {
        'dtype': 'float32',
        '_FillValue': 1e30,
        'zlib': False
        }, 
    'wind_speed': {
        'dtype': 'int32',
        '_FillValue': 1e7,
        'zlib': False
        },
    'temperature': {
        'dtype': 'int32',
        '_FillValue': 1e7,
        'zlib': False
        }, 
    'salinity': {
        'dtype': 'float32',
        '_FillValue': 1e7,
        'zlib': False
        }
    }
xrds.to_netcdf('../data/netcdf_files/04_example_multidimensional.nc',encoding=myencoding)

Checking your data#

Make sure you thoroughly check your file and it ideally should be run past all co-authors, just like when publishing a paper.

There are also validators you can run your files by to make sure that you file is compliant with the ACDD and CF conventions before you publish it. For example: https://compliance.ioos.us/index.html

How to cite this course#

If you think this course contributed to the work you are doing, consider citing it in your list of references. Here is a recommended citation:

Marsden, L. (2024, April 19). NetCDF in Python - from beginner to pro. Zenodo. https://doi.org/10.5281/zenodo.10997447

And you can navigate to the publication and export the citation in different styles and formats by clicking the icon below.

04: Creating a CF-NetCDF file

Contents

04: Creating a CF-NetCDF file#

Initialising your xarray dataset object#

Dimensions and coordinate variables#

Our data are a time series with 10 points in time#

Your times are timestamps#

We have multiple dimensions#

Data Variables#

1D array, e.g. a depth profile#

2D array, e.g. a grid of latitude and longitudes#

Example of 3D data#

3D data from pandas dataframe to 3D grid#

Metadata (attributes)#

Variable attributes#

Global attributes#

Exporting your xarray object to a NetCDF file#

Checking your data#

How to cite this course#