Comptes Rendus
Parallel hierarchical hybrid linear solvers for emerging computing platforms
Comptes Rendus. Mécanique, High Performance Computing, Volume 339 (2011) no. 2-3, pp. 96-103.

The design of the extreme-scale platforms that are expected to become available in the forthcoming decade will represent a convergence of technological trends and the boundary conditions imposed by over half a century of algorithm and application software development. These platforms will be hierarchical because they provide coarse grain parallelism between nodes and fine grain parallelism within each node. They are also expected to be very heterogeneous since multi-core chips and accelerators have completely different architectures and potentials. It is clear that such a degree of complexity will embody radical changes that will render obsolete the current software infrastructure for large-scale scientific applications. In this paper, we illustrate a hierarchical algorithmic approach for the implementation of an efficient parallel sparse linear solver that combines direct and iterative methods. Such a hybrid approach exploits the advantages of both numerical techniques and enables the use of several levels and grains of parallelism. This combination express different levels of parallelism and permits an optimal trade-off between numerical and parallel efficiency. Consequently, such a numerical technique appears as a promising candidate for intensive simulations on future many-core parallel platforms.

La conception des plateformes d'échelle extrême qui devraient être disponibles dans la décade à venir représenteront la convergence de tendances technologiques et définiront le cadre imposé au développement des futurs algorithmes et applications scientifiques. Ces plateformes seront hiérarchiques puisqu'elles disposeront d'un parallélisme à grain moyen entre les nœuds de calcul et d'un autre à grain fin à l'intérieur des unités de calcul. Elles seront également probablement très hétérogènes puisque leurs composantes multi-cœurs et cartes accélératrices auront des architectures et des caractéristiques très différentes. Il est clair qu'un tel degré de complexité nécessitera des changements radicaux qui rendront obsolètes les infracstrcutures logicielles actuelles pour le calcul des grandes applications. Dans cet article, nous illustrons une approche algorihmique hiérarchique pour la mise en œuvre d'un solveur linéaire parallèle qui combine des méthodes itératives et directes. Une telle approche hybride exploite les avantages des deux méthodes et permet l'utilisation de plusieurs niveaux de parallélisme de grains différents. Cette combinaison autorise un compromis optimal entre la performance parallèle et numérique de l'algorithme. En conséquence, une telle approche numérique apparaît comme une candidate prometteuse pour le calcul intensif sur les futures machines multi-cœurs.

Published online:
DOI: 10.1016/j.crme.2010.11.005
Keywords: Computer science, Hierarchical platform, Parallel architecture
Mots-clés : Informatique, Plateforme hiérarchique, Architecture parallèle

Emmanuel Agullo 1; Luc Giraud 1; Abdou Guermouche 1; Jean Roman 1

1 HiePACS Project – INRIA Bordeaux Sud-Ouest, Joint INRIA–CERFACS Lab. on High Performance Computing, PRES de Bordeaux, 351, Cours de la Libération, 33405 Talence cedex, France
@article{CRMECA_2011__339_2-3_96_0,
     author = {Emmanuel Agullo and Luc Giraud and Abdou Guermouche and Jean Roman},
     title = {Parallel hierarchical hybrid linear solvers for emerging computing platforms},
     journal = {Comptes Rendus. M\'ecanique},
     pages = {96--103},
     publisher = {Elsevier},
     volume = {339},
     number = {2-3},
     year = {2011},
     doi = {10.1016/j.crme.2010.11.005},
     language = {en},
}
TY  - JOUR
AU  - Emmanuel Agullo
AU  - Luc Giraud
AU  - Abdou Guermouche
AU  - Jean Roman
TI  - Parallel hierarchical hybrid linear solvers for emerging computing platforms
JO  - Comptes Rendus. Mécanique
PY  - 2011
SP  - 96
EP  - 103
VL  - 339
IS  - 2-3
PB  - Elsevier
DO  - 10.1016/j.crme.2010.11.005
LA  - en
ID  - CRMECA_2011__339_2-3_96_0
ER  - 
%0 Journal Article
%A Emmanuel Agullo
%A Luc Giraud
%A Abdou Guermouche
%A Jean Roman
%T Parallel hierarchical hybrid linear solvers for emerging computing platforms
%J Comptes Rendus. Mécanique
%D 2011
%P 96-103
%V 339
%N 2-3
%I Elsevier
%R 10.1016/j.crme.2010.11.005
%G en
%F CRMECA_2011__339_2-3_96_0
Emmanuel Agullo; Luc Giraud; Abdou Guermouche; Jean Roman. Parallel hierarchical hybrid linear solvers for emerging computing platforms. Comptes Rendus. Mécanique, High Performance Computing, Volume 339 (2011) no. 2-3, pp. 96-103. doi : 10.1016/j.crme.2010.11.005. https://comptes-rendus.academie-sciences.fr/mecanique/articles/10.1016/j.crme.2010.11.005/

[1] A. Buttari; J. Langou; J. Kurzak; J.J. Dongarra A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Comput. Syst. Appl., Volume 35 (2009), pp. 38-53

[2] E. Chan, E.S. Quintana-Orti, G. Gregorio Quintana-Orti, R. van de Geijn, Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures, in: Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures SPAA'07, June 2007, pp. 116–125.

[3] J. Demmel, L. Grigori, M. Hoemmen, J. Langou, Communication-optimal parallel and sequential QR and LU factorizations, LAPACK Working Note 204, August 2008.

[4] E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst, S. Thibault, S. Tomov, Faster, cheaper, better – a hybridization methodology to develop linear algebra software for GPUs, Technical Report 230, LAPACK Working Note, September 2010.

[5] G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, H. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemarinier, H. Ltaief, P. Luszczek, A. YarKhan, J. Dongarra, Distributed-memory task execution and dependence tracking within DAGuE and the DPLASMA project, Technical Report 232, LAPACK Working Note, September 2010.

[6] N.J. Higham Accuracy and Stability of Numerical Algorithms, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002

[7] Y. Saad Iterative Methods for Sparse Linear Systems, SIAM, Philadelphia, 2003

[8] M.R. Hestenes; E. Stiefel Methods of conjugate gradients for solving linear system, J. Res. Nat. Bur. Stds. B, Volume 49 (1952), pp. 409-436

[9] Y. Saad; M.H. Schultz GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comp., Volume 7 (1986), pp. 856-869

[10] W. Hackbusch Multigrid Methods and Applications, Springer-Verlag, Berlin, 1985

[11] M.A. Heroux; P. Raghavan; H.D. Simon Parallel Processing for Scientific Computing, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2006

[12] R. Barrett; M. Berry; T.F. Chan; J. Demmel; J. Donato; J. Dongarra; V. Eijkhout; R. Pozo; C. Romine; H. Van der Vorst Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia, PA, 1994

[13] J. Drkošová; M. Rozložník; Z. Strakoš; A. Greenbaum Numerical stability of the GMRES method, BIT, Volume 35 (1995), pp. 309-330

[14] A. Greenbaum Iterative Methods for Solving Linear Systems, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997

[15] B.F. Smith; P. Bjørstad; W. Gropp Domain Decomposition, Parallel Multilevel Methods for Elliptic Partial Differential Equations, Cambridge University Press, New York, 1996

[16] A. Quarteroni; A. Valli Domain Decomposition Methods for Partial Differential Equations, Numerical Mathematics and Scientific Computation, Oxford Science Publications, Oxford, 1999

[17] T. Mathew Domain Decomposition Methods for the Numerical Solution of Partial Differential Equations, Springer Lecture Notes in Computational Science and Engineering, Springer, 2008

[18] G. Karypis, V. Kumar, MeTiS – Unstructured graph partitioning and sparse matrix ordering system, version 2.0, University of Minnesota, June 1995.

[19] C. Chevalier; F. Pellegrini PT-SCOTCH: a tool for efficient parallel graph ordering, Parallel Comput., Volume 34 (2008) no. 6–8

[20] L.M. Carvalho; L. Giraud; G. Meurant Local preconditioners for two-level non-overlapping domain decomposition methods, Numer. Linear Algebra Appl., Volume 8 (2001) no. 4, pp. 207-227

[21] L. Giraud; A. Haidar; L.T. Watson Parallel scalability study of hybrid preconditioners in three dimensions, Parallel Comput., Volume 34 (2008), pp. 363-379

[22] A. Haidar, On the parallel scalability of hybrid solvers for large 3D problems, PhD dissertation, INPT, June 2008. TH/PA/08/57.

[23] X.-C. Cai; Y. Saad Overlapping domain decomposition algorithms for general sparse matrices, Numer. Linear Algebra Appl., Volume 3 (1996), pp. 221-237

[24] G. Radicati; Y. Robert Parallel conjugate gradient-like algorithms for solving nonsymmetric linear systems on a vector multiprocessor, Parallel Comput., Volume 11 (1989), pp. 223-239

[25] J.-F. Bourgat; R. Glowinski; P. Le Tallec; M. Vidrascu Variational formulation and algorithm for trace operator in domain decomposition calculations (Tony Chan; Roland Glowinski; Jacques Périaux; Olof Widlund, eds.), Domain Decomposition Methods, SIAM, Philadelphia, PA, 1989, pp. 3-16

[26] Y.-H. De Roeck; P. Le Tallec Analysis and test of a local domain decomposition preconditioner (Roland Glowinski; Yuri Kuznetsov; Gérard Meurant; Jacques Périaux; Olof Widlund, eds.), Fourth International Symposium on Domain Decomposition Methods for Partial Differential Equations, SIAM, Philadelphia, PA, 1991, pp. 112-128

[27] P.E. Bjørstad; O.B. Widlund Iterative methods for the solution of elliptic problems on regions partitioned into substructures, SIAM J. Numer. Anal., Volume 23 (1986) no. 6, pp. 1093-1120

[28] P. Hénon; P. Ramet; J. Roman PaStiX: A high-performance parallel direct solver for sparse symmetric definite systems, Parallel Comput., Volume 28 ( January 2002 ) no. 2, pp. 301-321

[29] J.W. Demmel; J.R. Gilbert; X.S. Li An asynchronous parallel supernodal algorithm for sparse Gaussian elimination, SIAM J. Matrix Anal. Appl., Volume 20 (1999) no. 4, pp. 915-952

[30] PLASMA users' guide, parallel linear algebra software for multicore architectures, November 2009 http://icl.cs.utk.edu/plasma (version 2.0)

[31] S. Tomov; R. Nath; P. Du; J. Dongarra MAGMA, version 0.2, user guide, November 2009 http://icl.cs.utk.edu/magma

[32] J. Kurzak, J. Dongarra, Implementation of the mixed-precision high performance LINPACK benchmark on the CELL processor, Technical Report LAPACK Working Note #177 UT-CS-06-580, University of Tennessee Computer Science, September 2006.

[33] J. Langou, J. Langou, P. Luszczek, J. Kurzak, A. Buttari, J. Dongarra, Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy, Technical Report LAPACK Working Note #175 UT-CS-06-574, University of Tennessee Computer Science, April 2006.

Cited by Sources:

Comments - Policy