Skip to content

isimip_utils.xarray

Functions for working with xarray datasets for ISIMIP data.

init_dataset(lon=720, lat=360, time=None, dims=None, attrs=None, **variables)

Initialize a new xarray dataset with standard ISIMIP dimensions.

Parameters:

Name Type Description Default
lon int | ndarray

Number of longitude points, or longitude array, or None to omit (default: 720).

720
lat int | ndarray

Number of latitude points, or latitude array, or None to omit (default: 360).

360
time int | ndarray

Number of time steps, or time array, or None to omit time dimension (default: None).

None
attrs dict

Dictionary of attributes for variables and global attributes.

None
dims list

List of dimensions (default time, lat, lon).

None
**variables ndarray

Data variables to include in the dataset.

{}

Returns:

Type Description
Dataset

Initialized xarray Dataset with coordinates and data variables.

open_dataset(path, decode_cf=True, load=False)

Open a NetCDF dataset using xarray.

Parameters:

Name Type Description Default
path str | Path

Path to the NetCDF file.

required
decode_cf bool

Whether to decode CF conventions (default: True).

True
load bool

Whether to load data into memory immediately (default: False).

False

Returns:

Type Description
Dataset

Xarray Dataset object.

Note

Handles non-standard time units like growing seasons and years by converting them to common_years with a 365_day calendar. month are read with the 360_day calendar.

load_dataset(path, decode_cf=True)

Open a NetCDF dataset using xarray and load data into memory immediately.

Parameters:

Name Type Description Default
path str | Path

Path to the NetCDF file.

required
decode_cf bool

Whether to decode CF conventions (default: True).

True

Returns:

Type Description
Dataset

Xarray Dataset object.

Note

Handles non-standard time units like growing seasons and years by converting them to common_years with a 365_day calendar. month are read with the 360_day calendar.

This is a shortcut for open_dataset(path, decode_cf, load=True).

write_dataset(ds, path)

Write an xarray dataset to a NetCDF file.

Parameters:

Name Type Description Default
ds Dataset

Xarray Dataset to write.

required
path str | Path

Path where the NetCDF file will be written.

required
Note

Automatically adds fill values, orders variables, adds compression and sets time as unlimited dimension.

order_variables(ds)

Reorder dataset variables with coordinates first, then data variables.

Parameters:

Name Type Description Default
ds Dataset

Xarray Dataset to reorder.

required

Returns:

Type Description
Dataset

Dataset with reordered variables.

get_attrs(ds)

Get all attributes from coordinates and data variables.

Parameters:

Name Type Description Default
ds Dataset

Xarray Dataset.

required

Returns:

Type Description
dict

Dictionary mapping variable names to their attributes.

set_attrs(ds, attrs)

Set attributes on coordinates and data variables.

Parameters:

Name Type Description Default
ds Dataset

Xarray Dataset to modify.

required
attrs dict

Dictionary mapping variable names to their attributes.

required

Returns:

Type Description
Dataset

Modified dataset with updated attributes.

set_fill_value_to_nan(ds)

Replace fill values with NaN in data variables. This is only needed for datasets which are read with decode_cf=False and _FillValue is not in encoding.

Parameters:

Name Type Description Default
ds Dataset

Xarray Dataset to modify.

required

Returns:

Type Description
Dataset

Dataset with fill values replaced by NaN.

set_nan_to_fill_value(ds)

Replace NaN values with fill values in data variables. This is only needed for datasets which are read with decode_cf=False and _FillValue is not in encoding.

Parameters:

Name Type Description Default
ds Dataset

Xarray Dataset to modify.

required

Returns:

Type Description
Dataset

Dataset with NaN values replaced by fill values.

remove_fill_value_from_coords(ds)

Remove _FillValue and missing_value attributes from the coords.

Parameters:

Name Type Description Default
ds Dataset

Xarray Dataset to modify.

required

Returns:

Type Description
Dataset

Dataset with fill value removed for the coords.

add_fill_value_to_data_vars(ds)

Add _FillValue and missing_value to data_vars if no encoding is present. This is the case for a newly created Dataset.

Parameters:

Name Type Description Default
ds Dataset

Xarray Dataset to modify.

required

Returns:

Type Description
Dataset

Dataset with encoding added for the data_vars.

add_compression_to_data_vars(ds, complevel=5)

Add compression to data variables.

Parameters:

Name Type Description Default
ds Dataset

Xarray Dataset to reorder.

required
complevel int

Compression level

5

Returns:

Type Description
Dataset

Dataset with updated encoding.

compute_time(ds, timestamp)

Convert a datetime to numeric time value for dataset.

Parameters:

Name Type Description Default
ds Dataset

Dataset with time coordinate containing units and calendar.

required
timestamp datetime | date | None

Timestamp to convert, or None.

required

Returns:

Type Description
float | None

Numeric time value in dataset's units, or None if timestamp is None.

compute_offset(ds1, ds2)

Compute time offset between two datasets with different time units.

Parameters:

Name Type Description Default
ds1 Dataset

First dataset with time coordinate.

required
ds2 Dataset

Second dataset with time coordinate.

required

Returns:

Type Description
DataArray | None

Time offset to apply to ds2, or None if units/calendars match.

create_mask(ds, df, layer)

Create a spatial mask from a geometry layer.

Parameters:

Name Type Description Default
ds Dataset

Xarray Dataset with lat/lon coordinates.

required
df DataFrame

GeoDataFrame with geometry column.

required
layer int

Index of the layer to use from the GeoDataFrame.

required

Returns:

Type Description
Dataset

Xarray dataset with a mask variable clipped to the geometry.

convert_time(time, units='days since 1601-1-1 00:00:00', calendar='proleptic_gregorian')

Convert an time coordinate array to np.float64 using cftime.date2num.

Parameters:

Name Type Description Default
time ndarray

Time coordinate array.

required
units str

Units for the time coordinate (default: days since 1601-1-1 00:00:00).

'days since 1601-1-1 00:00:00'
calendar str

Calendar type for time coordinate (default: proleptic_gregorian).

'proleptic_gregorian'

Returns:

Name Type Description
time ndarray

Time coordinate array as np.float64.

to_dataframe(ds)

Convert an xarray Dataset to a pandas DataFrame.

Parameters:

Name Type Description Default
ds Dataset

Xarray Dataset to convert.

required

Returns:

Type Description
DataFrame

Pandas DataFrame with coordinates as columns and data variables as columns. Attributes are preserved in df.attrs['coords'] and df.attrs['data_vars'].

Note

Time coordinates are converted to datetime64[ns] format. Data variables are converted to float64.