CSG Data Science and Machine Learning

A cooperation of TU Darmstadt
and RWTH Aachen University

Cross-Sectional Group

The CSG Data Science and Machine Learning team provides training in machine learning and process and data science techniques through resources like videos, tutorials, and code.

Our primary goals are to guide users through machine learning and process mining and to offer the necessary infrastructure to make these activities easily accessible on HPC (High-Performance Computing) systems. For process mining, we assist researchers in analyzing HPC data that includes a case identifier, an activity, and a timestamp.

 

We also offer tool support to facilitate these analyses. When integrating process mining with HPC, we focus on analyzing large scientific workflows (such as Simulation, ML, AI, PM, etc.) executed on the HPC cluster. Our aim is to analyze performance, identify bottlenecks, and improve scheduling and planning

If you have questions for other groups or general questions like access to the HPC infrastructure, have a look at our support website.

Current research topics:

  • AutoML
  • Federated Learning
  • SLURMminer
  • SLURMminer Engine

Support activities:

  • Learning material for Process Mining & Machine Learning
  • Learning material to provide an understanding of the type of data needed by these techniques
  • Support in overcoming data size/quality/privacy issues (in cooperation with CSG Data Management)
  • Assistance in specific infrastructural challenges
  • Support for scientists with Machine Learning or Process Mining problems applied to their specific research field

Planned teaching activities:

  • Videos for specific aspects (data parallelization in process mining, Python libraries for data science such as Tensorflow, Spark, Dask) of data science/machine learning in HPC
  • Process Mining Summer School 2022 (Aachen)
  • Hands-on workshop on ML@HPC

Training offers 2024:

Members

Prof. Dr. Kristian Kersting

TU Darmstadt

Prof. Dr. Bastian Leibe

RWTH Aachen University

Viktor Pfanschilling

TU Darmstadt

Zahra Sadeghibogar

RWTH Aachen University

Jonas Seng

TU Darmstadt

Prof. Dr. Wil van der Aalst

RWTH Aachen University

Publications

2023

  • Treatment Effect Estimation to Guide Model Optimization in Continual Learning (Jonas Seng, Florian P. Busch, Matej Zečević, Moritz Willig), Continual Causality Bridge Program (@AAAI 2023)
  • Causal Concept Identification in Open World Environments, (Moritz Willig, Matej Zečević, Jonas Seng, Florian P. Busch), Continual Causality Bridge Program (@AAAI 2023)
  • SLURMminer: A Tool for SLURM System Analysis with Process Mining, (Zahra Sadeghibogar, Alessandro Berti, Marco Pegoraro, Wil MP van der Aalst), BPM 23

2024

  • Learning Large DAGs is Harder Than You Think (Jonas Seng, Matej Zečević, Devendra Singh Dhami, Kristian Kersting), ICLR 24
  • Bi-Level One-Shot Architecture Search for Probabilistic Time Series Forecasting (Jonas Seng, Fabian Kalter, Zhongjie Yu, Fabrizio Ventola, Kristian Kersting), AutoML Conf 24
  • Psinet: Efficient Causal Modelling at Scale (Florian Busch, Moritz Willig, Jonas Seng, Kristian Kersting, Devendra Singh Dhami), PGM 24

 

  • Continually Updating Neural Causal Models, (Florian P. Busch, Jonas Seng, Moritz Willig, Matej Zečević), Continual Causality Bridge Program (@AAAI 2023)
  • Continual Causal Abstractions, (Matej Zečević, Moritz Willig, Florian P. Busch, Jonas Seng), Continual Causality Bridge Program (@AAAI 2023)
  • Exploring SLURM Logs through Process Mining: Insights into Scientific Workflows, (Zahra Sadeghibogar, Alessandro Berti, Marco Pegoraro, Wil MP van der Aalst), BPM 23

 

 

  • Hyperparameter Optimization via Interacting with Probabilistic Circuits (Jonas Seng, Fabrizio Ventola, Zhongjie Yu, Kristian Kersting), UAI 24 (TPM Workshop)
  • FEATHERS: Federated Architecture and Hyperparameter Search (Jonas Seng, Pooja Prasad, Devendra Singh Dhami, Martin Mundt, Kristian Kersting), AutoML Conf 24 (Workshop)