WP 4. Scalable parallel solvers for structural and fluid mechanics problems

To adapt traditional fluid and structural mechanics solvers for the exascale and to validate them in petascale machines. To explore new formulations most suitable for the exascale paradigms and parallel computation in general, such as particle methods.

We encompass under this label all fluid analysis solvers where scalability to a million cores is almost assured. Solvers that should scale well include all explicit solvers, iterative implicit solvers based on element or edge-loops, diagonal (or block-diagonal) preconditioners, and many of the solvers where mesh topology is not changing during the run.

As a starting point, we will use the family of explicit and implicit flow solvers implemented in the mutliphysics code KRATOS developed at CIMNE. Parts of these solvers have already been ported to an OpenMP/shared memory environment, to GPUs, and, to a limited extent, to an MPI/distributed memory environment, that will be used as a hybrid programming approach as long as no other alternatives prove to be mature enough. In this way, we do not start from a blank sheet, strengthening our chances of success.

Emphasis will be put in the identification of commonalities (as described in section B1.1.4 of the DoW) in simple fluid solvers that can be of general interest and applicability for other HPC developments.

Task leader: **LUH-IKM**. Partners involved: **CIMNE, LUH-IKM, NTUA**

We group in this task all fluid dynamic analysis solvers (both of explicit and implicit type) that will be more difficult to scale to a million cores. Solvers that fall into this group include many of the preconditioners (LU-SGS) for implicit solvers for cases where mesh topology is changing during the run, and multiphysic explicit and implicit solvers for particle-based and element-based methods.

For each of these solvers, gross load imbalances can appear during the run. This only emphasizes the need for a scalable, parallel load balancer as the one we intend to develop in NUMEXAS . At the same time, we will develop and implement the option of running in a single grid regions that are structured (Cartesian) and unstructured with maximum efficiency. A number of researchers have advocated Cartesian only techniques, but we will check the option of running with boundary fitted grids suitable for Reynolds-averaged (i.e. boundary-layer resolving grids) as essential for credible engineering runs. The algorithms developed in this task will be implemented in the suite of explicit and implicit fluid analysis solvers available in KRATOS.

Emphasis will be put in the identification of commonalities (as described in section B1.1.4 of the DoW) in simple fluid solvers that can be of general interest and applicability for other HPC developments.

Task leader: **CIMNE**. Partners involved: **CIMNE, LUH-IKM, NTUA, QUANTECH**

Explicit structural analysis solvers are used in many applications in engineering and applied sciences. This type of solution techniques are best suited to deal with frictional contact situations typical of impact, crashworthiness and blasting situations, among others. The experience to date has been that all explicit structural mechanics solvers scale reasonably well as long as contact and/or rupture are not present in the simulation. This is, however, the exception and not the norm. For instance, none of the applications listed above can be carried out without contact and/or rupture being modelled.

Explicit structural analysis using the Finite Element Method (FEM). The difficulty in using explicit analysis techniques with the FEM stems from the fact that two (or three) competing core algorithms consume similar amounts of CPU time: Element right-hand-sides (internal forces); contact; and rupture. But e.g., contact as well as rupture has global characteristics while internal forces can be computed in a highly localized manner. To run optimally, each of these computing tasks requires a different load subdivision. Therefore, new algorithms that overcome these difficulties will be developed.

Explicit analysis using the Discrete Element Method (DEM). The DEM is particularly suited for modelling problems characterized with strong discontinuities like rock fracturing during excavation. The discrete element model assumes that material can be represented by an assembly of rigid particles interacting among themselves. The contact law can be seen as the formulation of the material model on the microscopic level.

The DEM is a challenging numerical approach that can take big advantage from exascale computing features. The method to solve PDEs is much closer to the multi-agent paradigm.

Indeed, differently to the traditional formulations which consider the domain as a whole, where each element is affected by any of the others, DEM is formulated to take into account only their particle neighbours so it can naturally be computed by distributing each set of particles in each of the millions of cores, without demanding large movement of data.

In this task we aim at a disruptive success of the method, by developing explicit numerical techniques that can deal with the huge amount of particles (in the orders of billions) needed for solving real scale problems in exascales computers so that DEM can really make a difference to other more conventional methods. The algorithms and methods developed in this task will be implemented in the DEMPACK and DEMFLOW codes developed at CIMNE and IKM, respectively.

Emphasis will be put in the identification of commonalities (as described in section B1.1.4 of the DoW) in simple fluid solvers that can be of general interest and applicability for other HPC developments.

Task leader: **CIMNE**. Partners involved: **CIMNE, LUH-IKM, NTUA**

In this task we will develop parallel algorithms for solving the extremely large-scale systems of algebraic equations, arising in the implicit solution of large scale structural mechanics problems using the FEM, in the exascale computing environment.

This high demand for computing resources will be addressed with the implementation of innovative domain decomposition (DD) methods specially tailored to solve the particular types of problem at hand. The application of such DDM involves the partitioning of the discretized model into a set of disconnected, non-overlapping subdomains. The resulting interface problem is handled by a preconditioned conjugate projected gradient (PCPG) algorithm. Each projection step of the PCPG algorithm involves the solution of a coarse problem which guarantees the parallel and numerical scalability of the DDM along with a small iteration count. Improved variants have been proposed by researchers at NTUA for the efficient handling of multiple and/or repeated right-hand sides and near-by problems which will be further elaborated for application in a multi-core computing environment. A recent study performed at NTUA has shown the tremendous potential of the realization of DD methods in hybrid CPU/GPU workstation platforms. The proposed implementation ensures a high hardware utilization and substantial minimization of the idle time, both of which are major issues for the efficient exploitation of the available computing power of hybrid CPU/GPU HPC architectures. The workload balancing scheme on different processors that we aim to develop in this task achieves high portability to different platforms and exploits the capabilities of the heterogeneous hardware to the extent that high speed-up factors, in the range of two orders of magnitude, can be accomplished for just one-CPU/one-GPU configuration.

Task leader: **NTUA**. Partners involved: **CIMNE, LUH-IKM, NTUA**

Lead beneficiary: **LUH**

Lead beneficiary: **CIMNE**

Lead beneficiary: **CIMNE**

Lead beneficiary: **NTUA**

**
611636**