NHR4CES Community Workshop:
Performance Engineering for Numerical Methods in Computational Fluid Dynamics
After last year’s Community Workshop provided Machine learning in Computational Fluid Dynamics this year’s topic was Performance Engineering for Numerical Methods in Computational Fluid Dynamics (CFD). The workshop was organized by CSG Parallelism and Performance together with SDL Energy Conversion and SDL Fluids. 97 people attended the workshop which took place online on June 13 and 14, 2024.
CFD simulations are a crucial yet costly driving force behind many scientific computing, research, and industrial design endeavors. As such, they are responsible for the consumption of a large portion of available computing time in High-Performance Computing (HPC) systems and a worthwhile target for performance optimizations and studies.
Numerical experiments via CFD applications enable, for example, research towards low emissions in combustion engines or green fuels. Due to their relevance for and ever-changing requirements by research institutions and industry alike, they are constantly subjected to new developments and optimizations targeted toward the simulation of new or increasingly large problems with high accuracy. As a result of the complexity of the software, the amount of required computational resources, and the complexity of modern HPC systems used for the simulations, the importance and benefits of applying performance engineering techniques are evident.
This workshop was designed to highlight recent activities, developments, and new concepts for analyses and improvements centered around numerical methods used in CFD applications. Emphasis was placed on optimizations, applicable performance analyses and engineering techniques, and the presentation of new and innovative computational methods for specific problems.
The workshop’s aim was to connect research groups from different domains to increase the performance, efficiency, and capabilities of modern CFD applications by sparking discussions, potential collaborations, and an active exchange of ideas.
Workshop Report: Panel Discussion on Performance Engineering in CFD
At the recent workshop on Performance Engineering in CFD, moderated by Christian Hasse, a panel of experts including Harald Klimach, Temistocle Grenga, and Daniel Mira engaged in a stimulating discussion on the future of computational hardware and its impact on CFD. The panel covered several crucial topics:
Future Role of CPU-Based Systems
The panelists examined the evolving role of CPU-based systems in CFD. They noted that while some transition time might be necessary, the shift to GPU-based systems is likely to occur swiftly due to their superior power efficiency. Additionally, they discussed the possibility of a diversified computational environment that includes both CPU and GPU systems.
Community Codes in CFD
Addressing the potential for CFD to coalesce around a few community codes, the panelists agreed that if the field adapts slowly, a consolidation of codes is probable. This could streamline development and foster greater collaboration within the numerical combustion community.
The Impact of RISC-V
The discussion also touched on the potential changes brought about by RISC-V.
The panelists speculated on how this open-standard architecture might influence future computational strategies, offering flexibility and customization that could benefit CFD applications.
Adapting to Hardware Limitations
A significant portion of the discussion focused on adapting CFD techniques to the limitations of current hardware. The use of mixed precision was highlighted as a promising approach to optimize performance while maintaining sufficient accuracy.
Future of Software Development and Education
Looking ahead, the panelists emphasized the need for innovative software development and the importance of equipping young researchers with the necessary skills to navigate the evolving computational landscape. They advocated for integrating new teaching methods to prepare the next generation of CFD engineers.
The panel discussion provided valuable insights and sparked engaging conversations about the future of performance engineering in CFD. The workshop attendees left with a deeper understanding of the challenges and opportunities that lie ahead.
Below the speakers and the description of their talks you find a link to the respective talk from the NHR4CES Community Workshop 2024.
All about our speakers and their talks
Dr. Marta Garcia Gasulla
is a researcher at the Computer Science department of the Barcelona Supercomputing Center (BSC) since 2006. At BSC she leads the Best Practices for Performance and Programmability (BePPP). BePPP aims to bridge the gap between scientific domain researchers and computer scientists researchers while promoting best practices and gathering co-design insight through performance analysis. She obtained her PhD in Computer Architecture from Universitat Politecnica de Catalunya in 2017. Her research topics are load balancing, application performance, hybrid parallel programming, and parallel programming models. She has been involved in several European projects (HBP, Mont-blanc3, POP2, POP3, exaFOAM, EuPILOT, DEEP-SEA or Plasma-PEPSC) and collaboration projects with companies (e.g., Intel, IBM, or Huawei). She was an associate professor at the Universitat Politecnica de Catalunya (UPC) between 2008 and 2013, lecturing on Operating Systems and Parallel Programming courses.
Dr. Marta Garcia Gasulla's talk: Lessons learned from a performance analysis of an OpenFOAM HPC Grand Challenge
CFD simulations (Computational Fluid Dynamics) are among the most resource-intensive HPC applications. OpenFOAM is one of the best-known and most widely used CFD solvers. This talk presented a performance analysis of an HPC Grand Challenge with OpenFOAM. In particular, the case of a DLR high-pressure compressor (DLR CJH) with up to 6 thousand cores was analyzed.
It was shown how by combining different tools with different levels of detail, we are able to analyze a large-scale simulation. Based on this analysis, we identified some inefficiencies that were fixed by the developers, resulting in a speedup of more than 10x when running with 32K cores.
Gustavo de Morais
is a research associate within the Parallel Programming Group at the Technical University of Darmstadt. Currently pursuing his Ph.D., he focuses on advancing performance modeling in parallel programs. His research interests span instrumentation and measurement methods, performance modeling techniques, and strategies for mitigating noise effects in modeling.
Alexander Geiß
works as a research associate in the Parallel Programming Group at the Technical University of Darmstadt. He is currently undertaking a Ph.D., concentrating on enhancing the performance modeling of heterogeneous applications. His research covers various aspects including profiling heterogeneous applications, performance modeling, noise reduction, and cross-platform application performance prediction. Notably, he led the tools work package in the recently completed DEEP-SEA project, which focused on developing a software stack tailored for the upcoming European exascale systems.
Gustavo de Morais' and Alexander Geiß' talk: Performance Modeling for CFD Applications
Due to the complex numerical simulations associated with CFD, using high-performance computing (HPC) systems is often necessary. In this context, understanding the intricate scaling behavior of these applications is crucial for developers seeking to optimize their performance in HPC. This involves comprehending how the application behaves as the number of processes or inputs increases and identifying potential bottlenecks.
In their talk, Morais and Geiß demonstrated techniques for instrumenting, measuring, and understanding the performance of HPC applications by example of CFD study cases using OpenFOAM. They delved into interpreting the scaling behavior, encompassing both computation and communication aspects, using the performance modeling tool, Extra-P.
Paul Wilhelm
studied Math at RWTH Aachen University (finished 2019). Since 2020 he is a PhD student at ACoM with Manuel Torrilhon. He is a fellow of the NHR Graduate School and works on grid-free, structure-preserving numerical methods for kinetic plasma physics.
Fabian Orland
received his Bachelor’s and Master’s degrees in Computer Science from RWTH Aachen University. In August 2019 he joined the chair for high-performance computing at the IT Center of RWTH Aachen University as a research assistant and PhD student. From 2019 until 2022 he was a member of the EU Center of Excellence Performance Optimisation and Productivity (POP2) providing performance assessment services for academic and industrial users from many different scientific disciplines. Since 2021, Fabian has been a member of the Cross-Sectional Group Parallelism and Performance at the National High-Performance Computing Center for Computational Engineering Sciences (NHR4CES). Since 2022, he has also been a member of the EU Center of Excellence RAISE.
Paul Wilhelm's and Fabian Orland's talk: Assessing the performance of solvers for kinetic plasma dynamics in a six-dimensional phase-space
Plasmas are an ionized state of matter that occurs not only in stars or galactic formations but is also highly relevant for modern high-tech applications such as nuclear fusion reactors or microchip production. In particular, for cases where the involved plasmas are in a high-temperature state, such as fusion reactions, they exhibit kinetic effects that can no longer be captured by fluid dynamical models. Therefore one has to resort to the Vlasov equation arising from kinetic theory.
While understanding kinetic effects is crucial, solving the high-dimensional and turbulent Vlasov equation puts classical numerical approaches before significant challenges. Without simplifications, one has to solve a time-dependent, six-dimensional partial differential equation (PDE), where a resolution allowing to capture the dynamics accurately for a long time is infeasible even on modern high-performance computer hardware. Common discretization approaches for the Vlasov equation are the so-called semi-Lagrangian schemes, which combine the grid-based Eulerian and particle-based Lagrangian perspectives of convection-dominated PDEs to improve the stability and accuracy of the respective methods.
One such semi-Lagrangian approach able to handle the full six-dimensional case to some extent is based on a discontinuous Galerkin approach (SLDG).
Here Orland and Wilhelm presented a novel approach to solving the Vlasov–Poisson equation: the Numerical Flow Iteration (NuFI). To evaluate the numerical solution it stores the low-dimensional electric potentials and then employs these to iteratively reconstruct the characteristics backward in time. This reduces the total memory footprint by several orders of magnitude as complexity gets shifted from memory access to computation on the fly. Furthermore, this ansatz yields strong conservation properties due to the exploitation of the solution structure.
The Center of Excellence Performance Optimisation and Productivity established a standardized methodology to assess the performance of parallel applications based on hierarchical efficiency metrics. The scientists introduced the hierarchical model and demonstrated how they applied it to assess the performance of NuFI compared to the SLDG approach.
Prof. Ricardo Vinuesa
has been an associate Professor in KTH Royal Institute of Technology (Stockholm, Sweden) and Lead Faculty of KTH Climate Action Centre since 2020. He received his bachelor’s degree in mechanical engineering from the Polytechnic University of Valencia (Spain) in 2009 and his PhD in mechanical and aerospace engineering from Illinois Institute of Technology (Chicago, USA) in 2013. His research is focused on the development of machine learning to understand the complex turbulent flows through high-fidelity simulations and study, predict and control turbulent flows. The main activities include using data-driven methods to tackle outstanding problems in fluid mechanics, engineering and sustainability.
Prof. Ricardo Vinuesa's talk: Explaining and controlling turbulent flows through deep learning
In this presentation, a deep learning explainability framework was first used to identify the most important Reynolds stress (Q) events in a turbulent channel (simulated with DNS) and a turbulent boundary layer (experimentally determined). This objective method for evaluating importance shows that the most important Q events are not those with the highest Reynolds shear stress. This framework is also used to identify entirely new coherent structures, and we find that the most important coherent regions in the flow have only a 70% overlap with the classical Q events.
In the second part of the talk, Deep Reinforcement Learning (DRL) was used to discover completely new strategies for active flow control. We have shown that DRL, applied to a blow-and-suction scheme, significantly outperforms classical opposition control in a turbulent channel: while the former results in a 30% drag reduction, the latter is only 20%. We concluded that DRL has enormous potential to reduce drag in a wide range of complex turbulent flow configurations.
Dr. Joachim Jenke
is member of NHR4CES’s CSG Parallelism and Performance.
Dr. Joachim Jenke's talk: Performance and Correctness Analysis of an Exascale Application
This presentation highlighted correctness and performance analysis results for an open-source neuro-science application. Code developers reported unexpected crashes and hangs of the application. Data race analysis with ThreadSanitizer, available in GNU and LLVM compilers, in combination
with our extension for OpenMP-aware data race analysis, identified possible causes for the application behavior. We looked into the analysis setup and the reports from the tool. We furthermore discussed insights about the performance behavior of the code.
Dr. Temistocle Grenga
is a Lecturer in Computational Fluid Dynamics for Aerospace at the University of Southampton since February 2023. He is an expert in numerical simulation of turbulent reacting flows and data-driven modeling. He received an M.Sc. in Aeronautical Engineering at the Sapienza University of Rome (Italy) in 2009, an M.Sc. in Mechanical Engineering at the University of Notre Dame (USA) in 2013, and a Ph.D. in Aerospace and Mechanical Engineering at the same University in 2015. He was a Postdoctoral Research Associate at Princeton University (USA) from September 2015 to August 2018, and at RWTH Aachen (Germany) from September 2018 to January 2023, where he was also Leader of the Multiphase Group and the HPC Group. He published more than 75 papers in international journals and conference proceedings. He supervised 6 PhD students in his former position at RWTH Aachen (Germany) investigating Machine Learning modeling of turbulent flows, conservative numerical method for interface tracking, HPC GPU-based library for chemistry in CFD, and reduced order modeling of multiphase flows. He was invited as a lecturer on Machine Learning applications for fluid dynamics and combustion in several European schools for Ph.D. students.
Dr. Temistocle Grenga's talk: Efficient use of computational resources: The Wavelet Adaptive Multi-resolution Method
The intricate coupling between the fluid mechanics, heat transfer, and chemistry results is particularly difficult to simulate accurately. Diffusive effects and chemical reactions occur on molecular scales, and the resolution required to capture viscous effects would be one-tenth of a micron. Thus, spatial scales span seven or more orders of magnitude. Similarly, time scales can range from those associated with chemical reactions (nano-seconds) up to macro time scales of the order of seconds or more.
These multiscale problems are generally impractical to solve (in terms of computer time and memory required) on a fixed computational grid. However, in many problems of practical interest, small scales only occur in limited regions of the computational domain and possibly at certain times.
Wavelet methods are particularly well-suited for adaptively solving partial differential equations (PDEs). Wavelets are mathematical functions with compact support in both location and scale. Their amplitudes indicate the local regularity of a function: it is large in regions where a function changes sharply, and small where it is smooth.
Dynamically adaptive algorithms for initial value problems use the sparse set of collocation points at a previous time step to generate an updated set of points for the next time step, effectively tracking fine-resolution features as they develop in the domain. Wavelets provide a methodology for controlling grid adaptivity without requiring ad hoc error estimates or heuristics.
The Parallel Wavelet Adaptive Multi-resolution Method (WAMR) method has been implemented in Fortran 90 and uses the MPI standard for parallelization and a domain decomposition approach. It has been verified using several test problems in 1-, 2-, and 3-dimensions, including the classic Sod shock tube problem and the Taylor-Sedov blast wave.
The main feature of WAMR is the capability to reach the same accuracy as other methods using a grid 4 orders of magnitude smaller. The WAMR method has been applied to three compressible flow problems. Among the others, the evolution of a Richtmyer-Meshkov instability, and the evolution of a hydrogen bubble struck by a shock in air that has been performed in several configurations.
Dr. Daniel Mira
is the Head of the Propulsion Technologies Group at the Computing Applications for Science and Engineering (CASE) Department of the Barcelona Supercomputing Center (BSC). Dr Mira received his Bachelor’s Degree in mechanical engineering from the Universitat Politècnica de Valencia (Spain) in 2008 and his PhD in mechanical engineering from Lancaster University in 2012. His research is focused on the development of advanced simulation methods to investigate the combustion characteristics of propulsion and power systems. The main activities include physical modeling and numerical methods using High-Performance Computing (HPC) and data-driven approaches.
Dr. Daniel Mira's talk: Porting CPU-based/optimized combustion codes to CPU-GPU heterogeneous architectures
The increase in computational power over the past decade, coupled with the upcoming Exascale supercomputers, heralds a new era in computational modeling and simulation within combustion science. Given the complex multiscale and multiphysics nature of turbulent reacting flows, combustion simulations stand out as among the most demanding applications for state-of-the-art supercomputers. Exascale computing promises to push the boundaries of combustion system simulation, allowing for more realistic conditions
through high-fidelity methods. However, to effectively leverage these computing architectures, it is essential to employ methodologies that can exploit parallelism at all levels. This presentation covered practical aspects to be considered when porting CPU-based codes to accelerators built from the experience of the development of the multiphysics code Alya for combustion applications.
Dr. Harald Klimach
graduated in aerospace engineering at the University of Stuttgart in 2005 and worked afterward at the High-Performance Computing Centre Stuttgart (HLRS) in the European DEISA and PRACE projects on user code porting and optimization, the PRACE benchmark suite and a joint research activity on coupled fluid-fluid simulations. In 2010 he assumed a research position at chair of Advanced Simulations in Engineering at the German Research School in Simulation Sciences (Prof. Sabine Roller), a collaboration between the RWTH Aachen and FZ Jülich. He started the development of the APES framework with the group there as a common basis for the various activities concerning fluid simulations at the chair.
Since 2013 he worked as a researcher at the University Siegen, continuing the development of APES and teaching programming in Fortran and parallelization with MPI. He defended his PhD thesis “Parallel Multi-Scale-Simulations with Octrees and Coupled Applications” at the RWTH Aachen in 2016 and is working since 2021 at the DLR Institute of Software Methods for Product Virtualization in Dresden. Over the last decade, he also served as a lecturer and organizer in the introductory CFD course in the HLRS course program.
Dr. Harald Klimach's talk: Performance Assessments of the Lattice Boltzmann Solver Musubi
Presentation of the core concepts of Musubi in its deployment in high-performance computing environments for detailed flow simulations and elaboration on its performance analysis over time on various supercomputing systems.
Musubi is an open-source Lattice-Boltzmann (LBM) solver under development since 2011 within the APES framework using an octree mesh representation. It is implemented in modern Fortran and parallelized with the help of MPI and OpenMP. Dr. Klimach introduced some computational aspects of LBM and why it is attractive for some CFD tasks, detailed the implementation ideas of Musubi, and explained the wider architecture in APES before diving into performance analyses conducted for the code. Performance assessments have been performed for the solver repeatedly over time by different means and methods, including some analysis by HLRS within the POP Cluster of Excellence project.
He recounted how those performance analyses shed light on the various aspects of the implementation and helped improve the execution of large-scale computing systems. The deployment of the code spreads across a range of architectures spanning from IBMs BlueGene to NECs SX vector computers.
At the heart of those performance assessments, there always is the evaluation of the serial and node-level performance, which establishes a baseline for the parallel execution. An important observation in the overall consideration is that the performance depends on many factors acting together and depending on the features used in the code may vary widely. Klimach addressed these points and elaborated on common themes and apparent differences.
Prof. Christian Hasse
is principal investigator of our SDL Energy Conversion and professor at the Technical University of Darmstadt. Since 2017 he ist the head of the Institute for the Simulation of Reactive Thermo-Fluid Systems with currently 30 PhD students and postdocs. He received his diploma in mechanical engineering in 1998 and his PhD in 2004 (supervisor: Norbert Peters), both at RWTH Aachen University. After working in engine development at BMW in Munich for 5.5 years, he returned to academia in 2010. From 2010-2017, he was Professor of Numerical Thermofluid Dynamics at the Technical University of Freiberg before he moved to his current position in Darmstadt.
He has published more than 240 scientific papers in peer-reviewed journals and is also a reviewer for more than 20 scientific journals and several national and international funding agencies.
Prof. Christian Hasse hosted our panel discussion.