Publish data from Lake

Lake is FMI’s new general purpose data storage server. It can be accessed on the FMI servers as well as from anywhere on the internet. CSC is using the same system, which is called Allas. Many researchers are already familiar to this system. From the technical point of view, Lake is a modern object storage system (with an S3 interface). This means that instead of files, the data is stored as objects in buckets. So using the system is a bit different from normal file system. This kind of system is widely used in IT world. For example AWS relies on this heavily.

Pros:

  • The object storage can handle practically any static data.
  • The data can be accessed from anywhere using the URL. E.g. https://era5-data.lake.fmi.fi/index.html
  • The data can have different levels of access control.
  • The data can have lifecycle policy set.

Cons:

  • Specific tools are required to use the object storage. The object storage cannot be properly mounted for local disk-like usage. There are some tools that can do this, but they have their limitations. 
  • It is unsuitable for files that change constantly during their lifetime (e.g. most SQL databases).
  • The data cannot be modified while it is in Lake. It must be downloaded to a server for processing, and the previous version replaced with a new one.


There are some instructions in wiki (FI) for users: https://wiki.fmi.fi/pages/viewpage.action?pageId=82171760