13. March 2024

Data Transfer from and to S3 storages

At NHR4CES in RWTH Aachen University we use a large S3 storage as the underlying storage system for our research data management platform Coscine. S3 – or the Simple Storage Service – provides scalable, secure, and durable storage for large volumes of data. It allows researchers to store and retrieve their data with high availability and performance, making it suitable for RDM workflows. Additionally, S3’s integration with metadata and versioning features supports the organization and management of research data in a reliable and efficient manner. 

Since researchers are not always familiar with different types of storage systems, we provide them with documentation and best practices to help them to move their data from and to the S3 storage. Additionally, we provide the researchers with some reference values, so that they can compare their achieved transfer speeds with the speed that we achieve in an ideal scenario. With these reference values we could already identify a few bottlenecks within some institute’s infrastructure, that could be addressed shortly afterwards. These tests were then extended in the “NHR future project: data management” with the S3 storages of the different project partners. This allowed us to gain a better overview between different storage systems and make recommendations. Within this project we also compared different tools for data transfer with each other. Since the beginning of 2024 we extended our tests regularly with new storage systems and were able to support a few project partners by identifying issues with the observed data transfer rates.

Additionally, we try to improve the data transfer rates specifically between the CLAIX-2023 and our own S3 storage systems, thereby allowing users of our infrastructure a convenient access to both our HPC cluster and our research data management platform, that also scales well with large amounts of data. We are not only analyzing the transfer rates at the RWTH but also to different universities’ S3 solutions. We are constantly extending our tests with new S3 storages.

At NHR4CES in Aachen we developed and provide the research data management platform Coscine, it is envisioned to be used for hot and warm data and is often closely linked to ongoing HPC projects. The underlying storage system is an object storage that can be accessed with the S3 protocol. Since many HPC projects either analyzed or generate large amounts of data analyzing and improving data transfer rates becomes a vital challenge.

Contact our CSG Data Management to learn more!

Copyright: Image by fullvector on Freepik