NHR4CES Community Workshop 2025
Materials Science with Advanced Data Management and Data Science Techniques
We invite you to our NHR4CES Community Workshop 2025! This year’s topic is Materials Science with Advanced Data Management and Data Science Techniques. The online workshop is organized by SDL Materials Design, CSG Data Management and CSG Data Science and Machine Learning. The workshop will take place on May 07, 1pm-5pm and May 08, 9am-1pm.
With the increasing computational power, an abundance of data has become available in different research fields. To handle such an ever increasing amount of data, tools such as data management, machine learning and/or workflow managers receive increasing attention. In the field of materials science such approaches allow to investigate a wide range of materials and their properties in a systematic manner. The workflow managers are used in high-throughput calculations leading to the generation of huge amount of data. Although, high-throughput calculations are limited by the structure size. Structures with >10^3 atoms are limited by the computational resources. Researchers use the generated data and/or intelligent data mining as an input to machine learning techniques to investigate material properties, beyond limitations imposed by computational resources.
Additionally, through the collection of rich metadata in all steps of the workflow, results can be easily reproduced and reused by other researchers. This workshop is going to cover major parts of a data life cycle from generation of data via automatized workflow, their collection using data manager solutions, and their processing and/or re-using via application of machine learning while following the FAIR data principles.

What you need to know
Date: May 07-08, 2025
Language: English
Capacity: unlimited
Format: online
All about our speakers and their talks

Jan Janssen
is the group leader for materials informatics in the computational materials design department of Prof. Neugebauer at the Max Planck Institute for Sustainable Materials. In his work he aims to combine simulation and experiment in the same machine learning enabled workflows to accelerate the discovery of novel sustainable materials. On a technical level Jan’s interests range from training large language models to build workflows to benchmarking these workflows on the world’s largest super computers. He is a maintainer of over 900 materials informatics related software packages for the conda-forge computing and a frequent contributor to open-source projects on Github.
Jan Janssen's talk: Pyiron - Workflows for data-driven Materials Science
Jan Janssen is the group leader for materials informatics in the computational materials design department of Prof. Neugebauer at the Max Planck Institute for Sustainable Materials. In his work he aims to combine simulation and experiment in the same machine learning enabled workflows to accelerate the discovery of novel sustainable materials.
On a technical level Jan’s interests range from training large language models to build workflows to benchmarking these workflows on the world’s largest super computers. He is a maintainer of over 900 materials informatics related software packages for the conda-forge computing and a frequent contributor to open-source projects on Github.

Jan Micha Bodensohn
is a doctoral student supervised by Prof. Carsten Binnig at the Data and AI Systems Lab of the Technical University of Darmstadt and works as a researcher for the German Research Center for Artificial Intelligence (DFKI). His research centers on the automation of data engineering tasks with foundation models. As part of his work, he evaluates how Large Language Models (LLMs) can solve classical data engineering tasks on real-world tabular data such as enterprise databases.Micha joined the Data and AI Systems Lab as a doctoral student in 2023 after completing his Bachelor’s and Master’s degrees in Computer Science at TU Darmstadt. From 2020 to 2022, he worked as a student research assistant at the Data Management Lab at TU Darmstadt, focusing on data exploration and information extraction from text.

Liane Vogel
is a doctoral student supervised by Prof. Carsten Binnig at the Data and AI Systems Lab of the TU Darmstadt. Her research focuses on building foundation models for tabular data and relational databases to automate data engineering tasks. She thereby aims to build models that can take the full relational structure into account and find suitable representations for multi-table data.She started at the Data and AI Systems Lab in 2021 after obtaining a Bachelor’s and Master’s degree in Computer Science from TU Darmstadt. From 2018 to 2020 she worked as a student research assistant at the UKP Lab at TU Darmstadt in the area of argument search and crowdsourced annotation studies.
Micha Bodensohn's and Liane Vogel's talk: Wrangling Tabular Data with LLMs: What’s Possible and What’s Not
Data-driven tasks like empirical analysis and machine learning often come with substantial data preparation overheads. This is especially true for tabular data, which plays a central role in many domains like science and business. The required tables must first be found, integrated, cleaned, and formatted for each specific use case.
Recent research on Large Language Models (LLMs) for data engineering shows that LLMs can solve many of these tasks out-of-the-box without requiring expensive, specialized solutions. Yet these improvements on scientific benchmarks often fail to translate into practice. In our talk, we demystify what LLMs can and cannot do when it comes to wrangling tabular data.