CSG Data Science and Machine Learning
and RWTH Aachen University
Cross-Sectional Group
The CSG Data Science and Machine Learning team provides training in machine learning and process and data science techniques through resources like videos, tutorials, and code.
Our primary goals are to guide users through machine learning and process mining and to offer the necessary infrastructure to make these activities easily accessible on HPC (High-Performance Computing) systems. For process mining, we assist researchers in analyzing HPC data that includes a case identifier, an activity, and a timestamp.
We also offer tool support to facilitate these analyses. When integrating process mining with HPC, we focus on analyzing large scientific workflows (such as Simulation, ML, AI, PM, etc.) executed on the HPC cluster. Our aim is to analyze performance, identify bottlenecks, and improve scheduling and planning
If you have questions for other groups or general questions like access to the HPC infrastructure, have a look at our support website.
Current research topics:
- AutoML
- Federated Learning
- SLURMminer
- SLURMminer Engine
Support activities:
- Learning material for Process Mining & Machine Learning
- Learning material to provide an understanding of the type of data needed by these techniques
- Support in overcoming data size/quality/privacy issues (in cooperation with CSG Data Management)
- Assistance in specific infrastructural challenges
- Support for scientists with Machine Learning or Process Mining problems applied to their specific research field
Planned teaching activities:
- Videos for specific aspects (data parallelization in process mining, Python libraries for data science such as Tensorflow, Spark, Dask) of data science/machine learning in HPC
- Process Mining Summer School 2022 (Aachen)
- Hands-on workshop on ML@HPC
Training offers 2024:
Gallery
Publications
2023
- Treatment Effect Estimation to Guide Model Optimization in Continual Learning (Jonas Seng, Florian P. Busch, Matej Zečević, Moritz Willig), Continual Causality Bridge Program (@AAAI 2023)
- Causal Concept Identification in Open World Environments, (Moritz Willig, Matej Zečević, Jonas Seng, Florian P. Busch), Continual Causality Bridge Program (@AAAI 2023)
- SLURMminer: A Tool for SLURM System Analysis with Process Mining, (Zahra Sadeghibogar, Alessandro Berti, Marco Pegoraro, Wil MP van der Aalst), BPM 23
2024
- Learning Large DAGs is Harder Than You Think (Jonas Seng, Matej Zečević, Devendra Singh Dhami, Kristian Kersting), ICLR 24
- Bi-Level One-Shot Architecture Search for Probabilistic Time Series Forecasting (Jonas Seng, Fabian Kalter, Zhongjie Yu, Fabrizio Ventola, Kristian Kersting), AutoML Conf 24
- Psinet: Efficient Causal Modelling at Scale (Florian Busch, Moritz Willig, Jonas Seng, Kristian Kersting, Devendra Singh Dhami), PGM 24
- Continually Updating Neural Causal Models, (Florian P. Busch, Jonas Seng, Moritz Willig, Matej Zečević), Continual Causality Bridge Program (@AAAI 2023)
- Continual Causal Abstractions, (Matej Zečević, Moritz Willig, Florian P. Busch, Jonas Seng), Continual Causality Bridge Program (@AAAI 2023)
- Exploring SLURM Logs through Process Mining: Insights into Scientific Workflows, (Zahra Sadeghibogar, Alessandro Berti, Marco Pegoraro, Wil MP van der Aalst), BPM 23
- Hyperparameter Optimization via Interacting with Probabilistic Circuits (Jonas Seng, Fabrizio Ventola, Zhongjie Yu, Kristian Kersting), UAI 24 (TPM Workshop)
- FEATHERS: Federated Architecture and Hyperparameter Search (Jonas Seng, Pooja Prasad, Devendra Singh Dhami, Martin Mundt, Kristian Kersting), AutoML Conf 24 (Workshop)