04: Creating a CF-NetCDF file#
library(IRdisplay)
display_html('<iframe width="560" height="315" src="https://www.youtube.com/embed/IZDygRjfMIg?si=9Jp4P2k5daOikmJV" frameborder="0" allowfullscreen></iframe>')
In this session we will create a basic NetCDF file that compliant with the Attribute Convention for Data Discovery (ACDD) and Climate and Forecast (CF) convention.
Firstly, let’s import the libraries that we will work with.
if (!requireNamespace("RNetCDF", quietly = TRUE)) {
install.packages("RNetCDF")
}
library(RNetCDF)
Initialising your file#
Let’s first create an empty object that we are going to use.
ncds <- create.nc("../data/exported_from_notebooks/test.nc")
print.nc(ncds)
netcdf classic {
}
Dimensions and coordinate variables#
Dimensions define the shape of your data. Variables (your data) can be assigned one or more dimensions. A dimension in most cases is a spatial or temporal dimension (e.g. time, depth, latitude, longitude) but could also be something else (e.g. iteration, number of vertices for data representative of cells).
Dimensions tell you how many points you have for each coordinate. Coordinate variables tell you what the values for those points are.
Let’s imagine a few simple scenarios. I’ll initialise a new NetCDF dataset each time.
1 dimension - depth#
ncds <- create.nc("../data/exported_from_notebooks/empty.nc")
depths <- c(0,10,20,30,50,100)
num_depths = length(depths)
dim.def.nc(ncds,"depth",num_depths)
print.nc(ncds)
netcdf classic {
dimensions:
depth = 6 ;
}
You then need to add a coordinate variable (I’ll again call it depth) which has a dimension of depth. It is quite common for the dimension and coordinate variable to have the same name.
First we define the variable. In the var.get.nc
function below, the first argument ncds is my NetCDF file, depth is the name I am giving to the dimension, NC_INT is stating that the values will be integers, and the final argument depth says that this variable has one dimension called depth.
var.def.nc(ncds,"depth","NC_INT","depth")
print.nc(ncds)
netcdf classic {
dimensions:
depth = 6 ;
variables:
NC_INT depth(depth) ;
}
Only now can we add our values to the variables.
var.put.nc(ncds,"depth", depths)
print.nc(ncds)
netcdf classic {
dimensions:
depth = 6 ;
variables:
NC_INT depth(depth) ;
}
A key feature of a NetCDF file is that there is a defined structure so your data and metadata will always be in the same place within the file. This makes it easier for a machine to read it. We will add more types of data and metadata as we go, but first a few more examples.
A time series of data#
I’ll create a list of timestamps for myself first.
timestamps <- list(
as.POSIXct("2023-06-18 00:00:00", tz = "UTC"),
as.POSIXct("2023-06-18 03:00:00", tz = "UTC"),
as.POSIXct("2023-06-18 06:00:00", tz = "UTC"),
as.POSIXct("2023-06-18 09:00:00", tz = "UTC"),
as.POSIXct("2023-06-18 12:00:00", tz = "UTC"),
as.POSIXct("2023-06-18 15:00:00", tz = "UTC"),
as.POSIXct("2023-06-18 18:00:00", tz = "UTC"),
as.POSIXct("2023-06-18 21:00:00", tz = "UTC")
)
print(timestamps)
[[1]]
[1] "2023-06-18 UTC"
[[2]]
[1] "2023-06-18 03:00:00 UTC"
[[3]]
[1] "2023-06-18 06:00:00 UTC"
[[4]]
[1] "2023-06-18 09:00:00 UTC"
[[5]]
[1] "2023-06-18 12:00:00 UTC"
[[6]]
[1] "2023-06-18 15:00:00 UTC"
[[7]]
[1] "2023-06-18 18:00:00 UTC"
[[8]]
[1] "2023-06-18 21:00:00 UTC"
There are specific recommendations on how time should be stored in NetCDF-CF files. I will try to explain briefly here, and there is a nice explanation here too: https://www.unidata.ucar.edu/software/netcdf/time/recs.html
It is most common to have a dimension named “time” as well as a coordinate variable with the same name. Let’s discuss the variable first.
The “time” variable has units that count from a user defined origin, for example “hours since 2020-01-01 00:00 UTC” or “days since 2014-01-01”. The units may be in years, days, seconds, nanoseconds, etc. Whilst this approach may seem strange at a glance, it allows the times to be stored in a conventional numerical format such as integers or floats, and to our desired precision. This is much more efficient than using a long timestamp string for each time.
Some softwares (e.g. xarray in Python, Panoply) know how to interpret this and will convert the data into timestamps in when you extract the data from a CF-NetCDF file. Unfortunately, at the time of writing, RNetCDF can not do this.
Let’s see how we can convert our list of timestamps above into this format.
# Calculate the time differences in hours since the first timestamp
time_diff_hours <- sapply(timestamps, function(ts) as.integer(difftime(ts, timestamps[[1]], units = "hours")))
print(time_diff_hours)
[1] 0 3 6 9 12 15 18 21
num_times = length(time_diff_hours)
ncds <- create.nc("../data/exported_from_notebooks/1d.nc")
dim.def.nc(ncds,"time",num_times)
var.def.nc(ncds,"time","NC_INT","time")
var.put.nc(ncds,"time", time_diff_hours)
print.nc(ncds)
netcdf classic {
dimensions:
time = 8 ;
variables:
NC_INT time(time) ;
}
Multiple dimensions#
Now let’s create a NetCDF file with multiple dimensions.
ncds <- create.nc("../data/exported_from_notebooks/3d.nc")
depths <- c(0,10,20,30,50,100)
latitudes <- c(78.5271,79.2316,80.3261)
longitudes <- c(30.1515,28.5810)
dim.def.nc(ncds,"depth",length(depths))
dim.def.nc(ncds,"latitude",length(latitudes))
dim.def.nc(ncds,"longitude",length(longitudes))
var.def.nc(ncds,"depth","NC_INT","depth")
var.def.nc(ncds,"latitude","NC_FLOAT","latitude") # Values have decimal places, so NC_FLOAT
var.def.nc(ncds,"longitude","NC_FLOAT","longitude") # Values have decimal places, so NC_FLOAT
var.put.nc(ncds, "depth", depths)
var.put.nc(ncds, "latitude", latitudes)
var.put.nc(ncds, "longitude", longitudes)
print.nc(ncds)
netcdf classic {
dimensions:
depth = 6 ;
latitude = 3 ;
longitude = 2 ;
variables:
NC_INT depth(depth) ;
NC_FLOAT latitude(latitude) ;
NC_FLOAT longitude(longitude) ;
}
Data Variables#
Now let’s add some data variables.
You can choose what name you assign for each variable. This is not standardised, but be sensible and clear. I will show you how to make your data variables conform to the CF conventions using variable attributes in the next section.
1D variable#
depths <- c(0,10,20,30,50,100)
chlorophyll_a <- c(21.5, 18.5, 17.6, 16.8, 15.2, 14.8) # Must be same length as the dimension
ncds <- create.nc("../data/exported_from_notebooks/1d_chla.nc")
# Dimension and coordinate variable
dim.def.nc(ncds,"depth",length(depths))
var.def.nc(ncds,"depth","NC_INT","depth")
var.put.nc(ncds, "depth", depths)
# Data variable (chlorophyll_a) with 1 dimension (depth)
var.def.nc(ncds,"chlorophyll_a", "NC_FLOAT", "depth")
var.put.nc(ncds,"chlorophyll_a", chlorophyll_a)
print.nc(ncds)
print(var.get.nc(ncds,"chlorophyll_a"))
netcdf classic {
dimensions:
depth = 6 ;
variables:
NC_INT depth(depth) ;
NC_FLOAT chlorophyll_a(depth) ;
}
[1] 21.5 18.5 17.6 16.8 15.2 14.8
2D variable#
Now a 2D variable, e.g. a grid of longitude and latitudes
latitudes <- c(78.5271,79.2316,80.3261)
longitudes <- c(30.1515,28.5810)
# Create random wind speed values
wind_speed <- runif(length(latitudes) * length(longitudes), min = 0, max = 10)
# Reshape the wind speed values to match the latitude and longitude dimensions
wind_speed <- array(wind_speed, dim = c(length(latitudes), length(longitudes)))
print(wind_speed)
[,1] [,2]
[1,] 5.734686 3.243470
[2,] 7.617893 2.318425
[3,] 3.316931 7.016739
ncds <- create.nc("../data/exported_from_notebooks/2d_wind_speed.nc")
# Dimensions and coordinate variables
dim.def.nc(ncds,"latitude",length(latitudes))
dim.def.nc(ncds,"longitude",length(longitudes))
var.def.nc(ncds,"latitude","NC_FLOAT","latitude")
var.def.nc(ncds,"longitude","NC_FLOAT","longitude")
var.put.nc(ncds, "latitude", latitudes)
var.put.nc(ncds, "longitude", longitudes)
# Data variable with 2 dimensions
var.def.nc(ncds, "wind_speed", "NC_FLOAT", c("latitude", "longitude"))
var.put.nc(ncds, "wind_speed", wind_speed)
print.nc(ncds)
print(var.get.nc(ncds, "wind_speed"))
netcdf classic {
dimensions:
latitude = 3 ;
longitude = 2 ;
variables:
NC_FLOAT latitude(latitude) ;
NC_FLOAT longitude(longitude) ;
NC_FLOAT wind_speed(latitude, longitude) ;
}
[,1] [,2]
[1,] 5.734686 3.243470
[2,] 7.617893 2.318425
[3,] 3.316931 7.016739
Now you can see that the wind_speed variable has two dimensions; latitude and longitude. This is another major advantage of NetCDF files over tabular data formats like CSV or XLSX, which are limited in their ability to store multi-dimensional data. This multidimensional array can be used by code and software as it is without having to do any pre-processing.
3D variable#
depths <- c(0,10,20,30,50,100)
latitudes <- c(78.5271,79.2316,80.3261)
longitudes <- c(30.1515,28.5810)
sea_water_temperature <- runif(length(depths) * length(latitudes) * length(longitudes), min = 0, max = 2)
# Reshape the sea water temperature values to match the depth, latitude, and longitude dimensions
sea_water_temperature <- array(sea_water_temperature, dim = c(length(depths), length(latitudes), length(longitudes)))
print(sea_water_temperature)
ncds <- create.nc("../data/exported_from_notebooks/3d_sea_water_temperature.nc")
# Dimensions and coordinate variables
dim.def.nc(ncds,"depth",length(depths))
dim.def.nc(ncds,"latitude",length(latitudes))
dim.def.nc(ncds,"longitude",length(longitudes))
var.def.nc(ncds,"depth","NC_INT","depth")
var.def.nc(ncds,"latitude","NC_FLOAT","latitude")
var.def.nc(ncds,"longitude","NC_FLOAT","longitude")
var.put.nc(ncds, "depth", depths)
var.put.nc(ncds, "latitude", latitudes)
var.put.nc(ncds, "longitude", longitudes)
# Data variable with 3 dimensions
var.def.nc(ncds, "sea_water_temperature", "NC_FLOAT", c("depth", "latitude", "longitude"))
var.put.nc(ncds, "sea_water_temperature", sea_water_temperature)
print.nc(ncds)
, , 1
[,1] [,2] [,3]
[1,] 0.8094903 0.76118706 1.2408141
[2,] 1.4316365 1.80988373 1.2337254
[3,] 0.2142685 0.00168447 1.6689923
[4,] 0.5448078 0.48451312 0.4078551
[5,] 0.9243428 0.23839650 0.6491477
[6,] 1.5084980 1.78478335 0.1019812
, , 2
[,1] [,2] [,3]
[1,] 0.05114846 1.95800969 1.1021331
[2,] 1.14057254 0.08993623 1.3976797
[3,] 0.82082577 0.63020738 0.5110562
[4,] 0.63302837 1.81392065 1.2528194
[5,] 1.13992176 0.78656754 0.9194456
[6,] 1.27260120 1.33143783 1.5350169
netcdf classic {
dimensions:
depth = 6 ;
latitude = 3 ;
longitude = 2 ;
variables:
NC_INT depth(depth) ;
NC_FLOAT latitude(latitude) ;
NC_FLOAT longitude(longitude) ;
NC_FLOAT sea_water_temperature(depth, latitude, longitude) ;
}
3D data from data frame#
What if you have your data in Excel or a CSV file or some other tabular format? We can load in the data to a dataframe and then convert the data to a 3D array.
The code below is simply creating a dummy dataframe to use in this example.
depths <- c(0,10,20,30,50,100)
latitudes <- c(78.5271,79.2316,80.3261)
longitudes <- c(30.1515,28.5810)
# Create lists to store the coordinates and salinity values
depth_coordinates <- c()
latitude_coordinates <- c()
longitude_coordinates <- c()
salinity_values <- c()
# Generate the coordinates and salinity values for the grid
for (d in depths) {
for (lat in latitudes) {
for (lon in longitudes) {
depth_coordinates <- c(depth_coordinates, rep(d, 1))
latitude_coordinates <- c(latitude_coordinates, rep(lat, 1))
longitude_coordinates <- c(longitude_coordinates, rep(lon, 1))
salinity <- runif(1, min = 30, max = 35) # Random salinity value between 30 and 35
salinity_values <- c(salinity_values, salinity)
}
}
}
# Create a DataFrame
data <- data.frame(
Depth = depth_coordinates,
Latitude = latitude_coordinates,
Longitude = longitude_coordinates,
Salinity = salinity_values
)
head(data)
Depth | Latitude | Longitude | Salinity | |
---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | |
1 | 0 | 78.5271 | 30.1515 | 33.42550 |
2 | 0 | 78.5271 | 28.5810 | 34.04202 |
3 | 0 | 79.2316 | 30.1515 | 34.58183 |
4 | 0 | 79.2316 | 28.5810 | 30.25576 |
5 | 0 | 80.3261 | 30.1515 | 33.63557 |
6 | 0 | 80.3261 | 28.5810 | 33.04292 |
Now, let’s create a multidimensional grid for our salinity variable. We need to be a bit careful with the order here. The dataframe is sorted first by depth (6 depths), then by latitude (3 latitudes), then by longitude (2 longitudes). We should mirror that order.
salinity_3d_array <- array(data$Salinity, dim = c(length(depths), length(latitudes), length(longitudes)))
print(salinity_3d_array)
, , 1
[,1] [,2] [,3]
[1,] 33.42550 33.53527 33.84588
[2,] 34.04202 30.83869 33.34692
[3,] 34.58183 34.57929 33.91021
[4,] 30.25576 34.88842 34.74495
[5,] 33.63557 33.49634 32.94992
[6,] 33.04292 30.41398 34.97071
, , 2
[,1] [,2] [,3]
[1,] 33.99032 33.18225 33.36128
[2,] 30.07425 32.28305 31.70092
[3,] 31.80735 34.32434 34.08890
[4,] 33.71095 31.18969 34.83206
[5,] 30.34847 32.11048 32.14931
[6,] 34.54513 31.45586 34.57356
Now we just need to write the data to a NetCDF file, including our 3D array.
ncds <- create.nc("../data/exported_from_notebooks/3d_sea_water_salinity.nc")
# Dimensions and coordinate variables
dim.def.nc(ncds,"depth",length(depths))
dim.def.nc(ncds,"latitude",length(latitudes))
dim.def.nc(ncds,"longitude",length(longitudes))
var.def.nc(ncds,"depth","NC_INT","depth")
var.def.nc(ncds,"latitude","NC_FLOAT","latitude")
var.def.nc(ncds,"longitude","NC_FLOAT","longitude")
var.put.nc(ncds, "depth", depths)
var.put.nc(ncds, "latitude", latitudes)
var.put.nc(ncds, "longitude", longitudes)
# Data variable with 3 dimensions
var.def.nc(ncds, "salinity", "NC_FLOAT", c("depth", "latitude", "longitude"))
var.put.nc(ncds, "salinity", salinity_3d_array)
print.nc(ncds)
netcdf classic {
dimensions:
depth = 6 ;
latitude = 3 ;
longitude = 2 ;
variables:
NC_INT depth(depth) ;
NC_FLOAT latitude(latitude) ;
NC_FLOAT longitude(longitude) ;
NC_FLOAT salinity(depth, latitude, longitude) ;
}
Data on irregular grids or instruments that move#
Sometimes your data don’t fall on a regular grid. Sometimes you have an instrument that moves whilst recording data every second, for example. In these scenarios, it is not practical to try to assign multiple dimensions to your data. You will end of with a lot of empty space in your file.
Instead, you can use a single dimension, time, and make each of your coordinate variables (e.g. latitude, longitude, depth) 1D with time as the only dimension.
ncds <- create.nc("../data/exported_from_notebooks/instrument_that_moves.nc")
# Define dimensions
dim.def.nc(ncds, "time", 10)
# Define variables using time as the dimension
var.def.nc(ncds, "time", "NC_INT", c("time"))
var.def.nc(ncds, "latitude", "NC_FLOAT", c("time"))
var.def.nc(ncds, "longitude", "NC_FLOAT", c("time"))
var.def.nc(ncds, "depth", "NC_FLOAT", c("time"))
# Create example data
time_data <- seq(1, 10)
latitude_data <- rep(35.0, 10)
longitude_data <- seq(-120.0, -119.0, length.out = 10)
depth_data <- rep(10, 10)
# Write data to the variables
var.put.nc(ncds, "time", time_data)
var.put.nc(ncds, "latitude", latitude_data)
var.put.nc(ncds, "longitude", longitude_data)
var.put.nc(ncds, "depth", depth_data)
print.nc(ncds)
netcdf classic {
dimensions:
time = 10 ;
variables:
NC_INT time(time) ;
NC_FLOAT latitude(time) ;
NC_FLOAT longitude(time) ;
NC_FLOAT depth(time) ;
}
If all the points were recorded at the same time (e.g. a point cloud) you can use an arbitrary dimension called something like point and make each coordinate and data variable 1D.
ncds <- create.nc("../data/exported_from_notebooks/irregular_grid.nc")
# Define dimensions
dim.def.nc(ncds, "point", 10)
# Define variables using point as the dimension
var.def.nc(ncds, "latitude", "NC_FLOAT", c("point"))
var.def.nc(ncds, "longitude", "NC_FLOAT", c("point"))
var.def.nc(ncds, "depth", "NC_FLOAT", c("point"))
# Create example data
latitude_data <- rep(35.0, 10)
longitude_data <- seq(-120.0, -119.0, length.out = 10)
depth_data <- rep(10, 10)
# Write data to the variables
var.put.nc(ncds, "latitude", latitude_data)
var.put.nc(ncds, "longitude", longitude_data)
var.put.nc(ncds, "depth", depth_data)
print.nc(ncds)
netcdf classic {
dimensions:
point = 10 ;
variables:
NC_FLOAT latitude(point) ;
NC_FLOAT longitude(point) ;
NC_FLOAT depth(point) ;
}
Metadata (attributes)#
Hurrah! Your data are in a NetCDF file. But is that file be compliant with the FAIR principles? No! We need metadata.
Variable attributes are metadata that describe the variables. Global attributes are metadata that describe the file as a whole. You can find a list of attributes here provided by the Climate & Forecast (CF) conventions: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#attribute-appendix
The table in the link above specifies which attributes can be used as global attributes and which can be used as variable attributes. Some attributes can be used as either.
The CF conventions are light on discovery metadata. Discovery metadata are metadata that can be used to find data. For example, when and where the data were collected and by whom, some keywords etc. So we also use the ACDD convention - The Attribute Convention for Data Discovery. https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3
This is a list of recommendations. SIOS advises that people follow the requirements of the Arctic Data Centre, linked below. Requirements are a more effective way to encourage consistency than recommendations. These requirements are compliant with the ACDD conventions: https://adc.met.no/node/4
Variable attributes#
The CF conventions provide examples of which variable attributes you should be including in your CF-NetCDF file. For example for latitude: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html#latitude-coordinate
Additionally, the ACDD convention recommends that and attribute coverage_content_type is also added, which is used to state whether the data are modelResult, physicalMeasurement or something else, see the list here: https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3#Highly_Recommended_Variable_Attributes
Descriptions for the different options for coverage_content_type can be found here: https://wiki.esipfed.org/ISO_19115_and_19115-2_CodeList_Dictionaries#MD_CoverageContentTypeCode
And remember we might want to select additional applicable attributes for our variables from this section of the CF conventions: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#attribute-appendix
ncds <- create.nc("../data/exported_from_notebooks/3d_sea_water_salinity.nc")
# Dimensions and coordinate variables
dim.def.nc(ncds,"depth",length(depths))
dim.def.nc(ncds,"latitude",length(latitudes))
dim.def.nc(ncds,"longitude",length(longitudes))
var.def.nc(ncds,"depth","NC_INT","depth")
var.def.nc(ncds,"latitude","NC_FLOAT","latitude")
var.def.nc(ncds,"longitude","NC_FLOAT","longitude")
var.put.nc(ncds, "depth", depths)
var.put.nc(ncds, "latitude", latitudes)
var.put.nc(ncds, "longitude", longitudes)
# Data variable with 3 dimensions
var.def.nc(ncds, "salinity", "NC_FLOAT", c("depth", "latitude", "longitude"))
var.put.nc(ncds, "salinity", salinity_3d_array)
print.nc(ncds)
netcdf classic {
dimensions:
depth = 6 ;
latitude = 3 ;
longitude = 2 ;
variables:
NC_INT depth(depth) ;
NC_FLOAT latitude(latitude) ;
NC_FLOAT longitude(longitude) ;
NC_FLOAT salinity(depth, latitude, longitude) ;
}
att.put.nc(ncds, "latitude", "standard_name", "NC_CHAR", "latitude")
att.put.nc(ncds, "latitude", "long_name", "NC_CHAR", "latitude")
att.put.nc(ncds, "latitude", "units", "NC_CHAR", "degrees_north")
att.put.nc(ncds, "latitude", "coverage_content_type", "NC_CHAR", "coordinate")
att.put.nc(ncds, "longitude", "standard_name", "NC_CHAR", "longitude")
att.put.nc(ncds, "longitude", "long_name", "NC_CHAR", "longitude")
att.put.nc(ncds, "longitude", "units", "NC_CHAR", "degrees_east")
att.put.nc(ncds, "longitude", "coverage_content_type", "NC_CHAR", "coordinate")
att.put.nc(ncds, "depth", "standard_name", "NC_CHAR", "depth")
att.put.nc(ncds, "depth", "long_name", "NC_CHAR", "depth below sea level")
att.put.nc(ncds, "depth", "units", "NC_CHAR", "meters")
att.put.nc(ncds, "depth", "coverage_content_type", "NC_CHAR", "coordinate")
att.put.nc(ncds, "depth", "positive", "NC_CHAR", "down")
att.put.nc(ncds, "salinity", "standard_name", "NC_CHAR", "sea_water_salinity")
att.put.nc(ncds, "salinity", "long_name", "NC_CHAR", "a description about the variable in your own words")
att.put.nc(ncds, "salinity", "units", "NC_CHAR", "psu")
att.put.nc(ncds, "salinity", "coverage_content_type", "NC_CHAR", "modelResult")
print.nc(ncds)
netcdf classic {
dimensions:
depth = 6 ;
latitude = 3 ;
longitude = 2 ;
variables:
NC_INT depth(depth) ;
NC_CHAR depth:standard_name = "depth" ;
NC_CHAR depth:long_name = "depth below sea level" ;
NC_CHAR depth:units = "meters" ;
NC_CHAR depth:coverage_content_type = "coordinate" ;
NC_CHAR depth:positive = "down" ;
NC_FLOAT latitude(latitude) ;
NC_CHAR latitude:standard_name = "latitude" ;
NC_CHAR latitude:long_name = "latitude" ;
NC_CHAR latitude:units = "degrees_north" ;
NC_CHAR latitude:coverage_content_type = "coordinate" ;
NC_FLOAT longitude(longitude) ;
NC_CHAR longitude:standard_name = "longitude" ;
NC_CHAR longitude:long_name = "longitude" ;
NC_CHAR longitude:units = "degrees_east" ;
NC_CHAR longitude:coverage_content_type = "coordinate" ;
NC_FLOAT salinity(depth, latitude, longitude) ;
NC_CHAR salinity:standard_name = "sea_water_salinity" ;
NC_CHAR salinity:long_name = "a description about the variable in your own words" ;
NC_CHAR salinity:units = "psu" ;
NC_CHAR salinity:coverage_content_type = "modelResult" ;
}
Global attributes#
As mentioned above, the requirements of the Arctic Data Centre for global attributes (based on the ACDD convention) can serve as a guide for which global attributes you should be including. https://adc.met.no/node/4
And remember we might want to select additional applicable global attributes from this section of the CF conventions: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#attribute-appendix
Go through and add each required attribute and any others you wish to. You are also welcome to add any custom attributes on top of these requirements.
In RNetCDF, the syntax for adding a global attribute is the same as for adding a variable attribute, but we use a special “variable” name NC_GLOBAL.
# Define the global attributes as an R list
attributes <- list(
id = "your_unique_id_here",
naming_authority = "institution that provides the id",
title = "my title",
summary = "analagous to an abstract in the paper, describing the data and how they were collected and processed",
creator_type = "person",
creator_name = "John Smith; Luke Marsden", # Who collected and processed the data up to this point
creator_email = "johns@unis.no; lukem@met.no",
creator_institution = "The University Centre in Svalbard (UNIS); Norwegian Meteorological Institute (MET)",
creator_url = "; https://orcid.org/0000-0002-9746-544X", # OrcID is best practice if possible. Other URLs okay, or leave blank for authors that don't have one.
time_coverage_start = "2020-05-10T08:14:58Z",
time_coverage_end = "2020-05-10T11:51:12Z",
keywords = "sea_water_salinity",
keywords_vocabulary = "CF:NetCDF COARDS Climate and Forecast Standard Names",
institution = "Your Institution",
publisher_name = "Publisher Name", # Data centre where your data will be published
publisher_email = "publisher@email.com",
publisher_url = "publisher_url_here",
license = "https://creativecommons.org/licenses/by/4.0/",
Conventions = "ACDD-1.3, CF-1.8", # Choose which ever version you will check your file against using a compliance checker
project = "Your project name"
)
# Loop through the attributes and add them to the NetCDF file.
for (key in names(attributes)) {
att.put.nc(ncds, "NC_GLOBAL", key, "NC_CHAR", attributes[[key]])
}
# These attributes are all "NC_CHAR" format. You will need to adjust the code if writing other formats.
print.nc(ncds)
netcdf classic {
dimensions:
depth = 6 ;
latitude = 3 ;
longitude = 2 ;
variables:
NC_INT depth(depth) ;
NC_CHAR depth:standard_name = "depth" ;
NC_CHAR depth:long_name = "depth below sea level" ;
NC_CHAR depth:units = "meters" ;
NC_CHAR depth:coverage_content_type = "coordinate" ;
NC_CHAR depth:positive = "down" ;
NC_FLOAT latitude(latitude) ;
NC_CHAR latitude:standard_name = "latitude" ;
NC_CHAR latitude:long_name = "latitude" ;
NC_CHAR latitude:units = "degrees_north" ;
NC_CHAR latitude:coverage_content_type = "coordinate" ;
NC_FLOAT longitude(longitude) ;
NC_CHAR longitude:standard_name = "longitude" ;
NC_CHAR longitude:long_name = "longitude" ;
NC_CHAR longitude:units = "degrees_east" ;
NC_CHAR longitude:coverage_content_type = "coordinate" ;
NC_FLOAT salinity(depth, latitude, longitude) ;
NC_CHAR salinity:standard_name = "sea_water_salinity" ;
NC_CHAR salinity:long_name = "a description about the variable in your own words" ;
NC_CHAR salinity:units = "psu" ;
NC_CHAR salinity:coverage_content_type = "modelResult" ;
// global attributes:
NC_CHAR :id = "your_unique_id_here" ;
NC_CHAR :naming_authority = "institution that provides the id" ;
NC_CHAR :title = "my title" ;
NC_CHAR :summary = "analagous to an abstract in the paper, describing the data and how they were collected and processed" ;
NC_CHAR :creator_type = "person" ;
NC_CHAR :creator_name = "John Smith; Luke Marsden" ;
NC_CHAR :creator_email = "johns@unis.no; lukem@met.no" ;
NC_CHAR :creator_institution = "The University Centre in Svalbard (UNIS); Norwegian Meteorological Institute (MET)" ;
NC_CHAR :creator_url = "; https://orcid.org/0000-0002-9746-544X" ;
NC_CHAR :time_coverage_start = "2020-05-10T08:14:58Z" ;
NC_CHAR :time_coverage_end = "2020-05-10T11:51:12Z" ;
NC_CHAR :keywords = "sea_water_salinity" ;
NC_CHAR :keywords_vocabulary = "CF:NetCDF COARDS Climate and Forecast Standard Names" ;
NC_CHAR :institution = "Your Institution" ;
NC_CHAR :publisher_name = "Publisher Name" ;
NC_CHAR :publisher_email = "publisher@email.com" ;
NC_CHAR :publisher_url = "publisher_url_here" ;
NC_CHAR :license = "https://creativecommons.org/licenses/by/4.0/" ;
NC_CHAR :Conventions = "ACDD-1.3, CF-1.8" ;
NC_CHAR :project = "Your project name" ;
}
In this case, it makes sense to add some attributes based on information we have already provided.
For example, the geospatial limits can be derived from our data.
att.put.nc(ncds, "NC_GLOBAL", "geospatial_lat_min", "NC_FLOAT", min(lat))
att.put.nc(ncds, "NC_GLOBAL", "geospatial_lat_max", "NC_FLOAT", max(lat))
att.put.nc(ncds, "NC_GLOBAL", "geospatial_lon_min", "NC_FLOAT", min(lon))
att.put.nc(ncds, "NC_GLOBAL", "geospatial_lon_max", "NC_FLOAT", max(lon))
We can include the current time in the date_created and history attributes.
dtnow <- Sys.time()
attr(dtnow, "tzone") <- "UTC"
dt8601 <- format(dtnow, "%Y-%m-%dT%H:%M:%SZ") # date and time in ISO 8601 format
att.put.nc(ncds, "NC_GLOBAL", "date_created", "NC_CHAR", dt8601)
history <- paste("File created at", dt8601, "using RNetCDF by Luke Marsden")
att.put.nc(ncds, "NC_GLOBAL", "history", "NC_CHAR", history)
print.nc(ncds)
netcdf classic {
dimensions:
depth = 6 ;
latitude = 3 ;
longitude = 2 ;
variables:
NC_INT depth(depth) ;
NC_CHAR depth:standard_name = "depth" ;
NC_CHAR depth:long_name = "depth below sea level" ;
NC_CHAR depth:units = "meters" ;
NC_CHAR depth:coverage_content_type = "coordinate" ;
NC_CHAR depth:positive = "down" ;
NC_FLOAT latitude(latitude) ;
NC_CHAR latitude:standard_name = "latitude" ;
NC_CHAR latitude:long_name = "latitude" ;
NC_CHAR latitude:units = "degrees_north" ;
NC_CHAR latitude:coverage_content_type = "coordinate" ;
NC_FLOAT longitude(longitude) ;
NC_CHAR longitude:standard_name = "longitude" ;
NC_CHAR longitude:long_name = "longitude" ;
NC_CHAR longitude:units = "degrees_east" ;
NC_CHAR longitude:coverage_content_type = "coordinate" ;
NC_FLOAT salinity(depth, latitude, longitude) ;
NC_CHAR salinity:standard_name = "sea_water_salinity" ;
NC_CHAR salinity:long_name = "a description about the variable in your own words" ;
NC_CHAR salinity:units = "psu" ;
NC_CHAR salinity:coverage_content_type = "modelResult" ;
// global attributes:
NC_CHAR :id = "your_unique_id_here" ;
NC_CHAR :naming_authority = "institution that provides the id" ;
NC_CHAR :title = "my title" ;
NC_CHAR :summary = "analagous to an abstract in the paper, describing the data and how they were collected and processed" ;
NC_CHAR :creator_type = "person" ;
NC_CHAR :creator_name = "John Smith; Luke Marsden" ;
NC_CHAR :creator_email = "johns@unis.no; lukem@met.no" ;
NC_CHAR :creator_institution = "The University Centre in Svalbard (UNIS); Norwegian Meteorological Institute (MET)" ;
NC_CHAR :creator_url = "; https://orcid.org/0000-0002-9746-544X" ;
NC_CHAR :time_coverage_start = "2020-05-10T08:14:58Z" ;
NC_CHAR :time_coverage_end = "2020-05-10T11:51:12Z" ;
NC_CHAR :keywords = "sea_water_salinity" ;
NC_CHAR :keywords_vocabulary = "CF:NetCDF COARDS Climate and Forecast Standard Names" ;
NC_CHAR :institution = "Your Institution" ;
NC_CHAR :publisher_name = "Publisher Name" ;
NC_CHAR :publisher_email = "publisher@email.com" ;
NC_CHAR :publisher_url = "publisher_url_here" ;
NC_CHAR :license = "https://creativecommons.org/licenses/by/4.0/" ;
NC_CHAR :Conventions = "ACDD-1.3, CF-1.8" ;
NC_CHAR :project = "Your project name" ;
NC_FLOAT :geospatial_lat_min = 80.3261032104492 ;
NC_FLOAT :geospatial_lat_max = 80.3261032104492 ;
NC_FLOAT :geospatial_lon_min = 28.5809993743896 ;
NC_FLOAT :geospatial_lon_max = 28.5809993743896 ;
NC_CHAR :date_created = "2024-10-14T08:13:25Z" ;
NC_CHAR :history = "File created at 2024-10-14T08:13:25Z using RNetCDF by Luke Marsden" ;
}
Finally, we close the file.
close.nc(ncds)
Checking your data#
Make sure you thoroughly check your file and it ideally should be run past all co-authors, just like when publishing a paper.
There are also validators you can run your files by to make sure that you file is compliant with the ACDD and CF conventions before you publish it. For example: https://compliance.ioos.us/index.html
How to cite this course#
If you think this course contributed to the work you are doing, consider citing it in your list of references. Here is a recommended citation:
Marsden, L. (2024, May 31). NetCDF in R - from beginner to pro. Zenodo. https://doi.org/10.5281/zenodo.11400754
And you can navigate to the publication and export the citation in different styles and formats by clicking the icon below.