(continuation of discussion from this Discord thread)
Currently downloading ERA-5 data often takes >4hrs, even when downloading small cutouts. This is caused by long queues on the CDS side. In addition to the annoying wait, this results in two issues:
- New users may think there is something wrong with their code and waste time debugging it, because this behavior is not documented anywhere
- Many projects generate cutouts and store them in git/Zenodo to avoid download queues, or implement unnecessary caching logic, which is not great architecturally
Detailed Description
Add another source of ERA5 data which doesn't have a queue. There are many mirrors of the dataset in various formats on the Internet, most of which do not have a big queue
Context
I am a new user of atlite and have wasted a day debugging my caching code before I realised those downloads are supposed to take this long.
Possible Implementation
For backward compatibility, I suggest keeping the default module="era5 intact and use CDS data store. I suggest adding new modules (e.g. era5-ncar, era5-timeseries) to give the user a choice of sources. The API would then be:
cutout = atlite.Cutout(
path="spain-2024.nc",
module="era5-ncar", # <-- only change from "era5"
x=slice(-10, 5),
y=slice(35, 44),
I did some research into sources of ERA-5. There are various options, many of them in an xarray friendly ZARR format. Unfortunately, most of them store the data in chunks of 1h, covering the whole world, which is not great if one only wants a cutout for e.g. one country (8760hrs is ~200GB of data). I found three sources which are stored in a format more useful for atlite:
- Earthmover Arraylake - has small chunks stored as ZARR. It's free, but hosted by a relatively small company, may not be a great option long-term
- [NSF NCAR] - US-gov backed, stores ERA-5 in three different formats, efficient subsetting available
- CDS ERA-5 hourly timeseries version - new product from CDS (introduced Feb 2026), allows for efficient download of a timeseries for a single point on a 0.25x0.25 deg. grid. Uses Zarr under the hood. CDS says the API is unstable at the moment and data format is subject to change. But very promising for small downloads.
I suggest implementing 2 and 3 as options.
(continuation of discussion from this Discord thread)
Currently downloading ERA-5 data often takes >4hrs, even when downloading small cutouts. This is caused by long queues on the CDS side. In addition to the annoying wait, this results in two issues:
Detailed Description
Add another source of ERA5 data which doesn't have a queue. There are many mirrors of the dataset in various formats on the Internet, most of which do not have a big queue
Context
I am a new user of
atliteand have wasted a day debugging my caching code before I realised those downloads are supposed to take this long.Possible Implementation
For backward compatibility, I suggest keeping the default
module="era5intact and use CDS data store. I suggest adding new modules (e.g.era5-ncar,era5-timeseries) to give the user a choice of sources. The API would then be:I did some research into sources of ERA-5. There are various options, many of them in an
xarrayfriendly ZARR format. Unfortunately, most of them store the data in chunks of 1h, covering the whole world, which is not great if one only wants a cutout for e.g. one country (8760hrs is ~200GB of data). I found three sources which are stored in a format more useful foratlite:I suggest implementing 2 and 3 as options.