The design of the extreme-scale platforms that are expected to become available in the forthcoming decade will represent a convergence of technological trends and the boundary conditions imposed by over half a century of algorithm and application software development. These platforms will be hierarchical because they provide coarse grain parallelism between nodes and fine grain parallelism within each node. They are also expected to be very heterogeneous since multi-core chips and accelerators have completely different architectures and potentials. It is clear that such a degree of complexity will embody radical changes that will render obsolete the current software infrastructure for large-scale scientific applications. In this paper, we illustrate a hierarchical algorithmic approach for the implementation of an efficient parallel sparse linear solver that combines direct and iterative methods. Such a hybrid approach exploits the advantages of both numerical techniques and enables the use of several levels and grains of parallelism. This combination express different levels of parallelism and permits an optimal trade-off between numerical and parallel efficiency. Consequently, such a numerical technique appears as a promising candidate for intensive simulations on future many-core parallel platforms.
La conception des plateformes d'échelle extrême qui devraient être disponibles dans la décade à venir représenteront la convergence de tendances technologiques et définiront le cadre imposé au développement des futurs algorithmes et applications scientifiques. Ces plateformes seront hiérarchiques puisqu'elles disposeront d'un parallélisme à grain moyen entre les nœuds de calcul et d'un autre à grain fin à l'intérieur des unités de calcul. Elles seront également probablement très hétérogènes puisque leurs composantes multi-cœurs et cartes accélératrices auront des architectures et des caractéristiques très différentes. Il est clair qu'un tel degré de complexité nécessitera des changements radicaux qui rendront obsolètes les infracstrcutures logicielles actuelles pour le calcul des grandes applications. Dans cet article, nous illustrons une approche algorihmique hiérarchique pour la mise en œuvre d'un solveur linéaire parallèle qui combine des méthodes itératives et directes. Une telle approche hybride exploite les avantages des deux méthodes et permet l'utilisation de plusieurs niveaux de parallélisme de grains différents. Cette combinaison autorise un compromis optimal entre la performance parallèle et numérique de l'algorithme. En conséquence, une telle approche numérique apparaît comme une candidate prometteuse pour le calcul intensif sur les futures machines multi-cœurs.
Mots-clés : Informatique, Plateforme hiérarchique, Architecture parallèle
Emmanuel Agullo 1; Luc Giraud 1; Abdou Guermouche 1; Jean Roman 1
@article{CRMECA_2011__339_2-3_96_0, author = {Emmanuel Agullo and Luc Giraud and Abdou Guermouche and Jean Roman}, title = {Parallel hierarchical hybrid linear solvers for emerging computing platforms}, journal = {Comptes Rendus. M\'ecanique}, pages = {96--103}, publisher = {Elsevier}, volume = {339}, number = {2-3}, year = {2011}, doi = {10.1016/j.crme.2010.11.005}, language = {en}, }
TY - JOUR AU - Emmanuel Agullo AU - Luc Giraud AU - Abdou Guermouche AU - Jean Roman TI - Parallel hierarchical hybrid linear solvers for emerging computing platforms JO - Comptes Rendus. Mécanique PY - 2011 SP - 96 EP - 103 VL - 339 IS - 2-3 PB - Elsevier DO - 10.1016/j.crme.2010.11.005 LA - en ID - CRMECA_2011__339_2-3_96_0 ER -
%0 Journal Article %A Emmanuel Agullo %A Luc Giraud %A Abdou Guermouche %A Jean Roman %T Parallel hierarchical hybrid linear solvers for emerging computing platforms %J Comptes Rendus. Mécanique %D 2011 %P 96-103 %V 339 %N 2-3 %I Elsevier %R 10.1016/j.crme.2010.11.005 %G en %F CRMECA_2011__339_2-3_96_0
Emmanuel Agullo; Luc Giraud; Abdou Guermouche; Jean Roman. Parallel hierarchical hybrid linear solvers for emerging computing platforms. Comptes Rendus. Mécanique, High Performance Computing, Volume 339 (2011) no. 2-3, pp. 96-103. doi : 10.1016/j.crme.2010.11.005. https://comptes-rendus.academie-sciences.fr/mecanique/articles/10.1016/j.crme.2010.11.005/
[1] A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Comput. Syst. Appl., Volume 35 (2009), pp. 38-53
[2] E. Chan, E.S. Quintana-Orti, G. Gregorio Quintana-Orti, R. van de Geijn, Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures, in: Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures SPAA'07, June 2007, pp. 116–125.
[3] J. Demmel, L. Grigori, M. Hoemmen, J. Langou, Communication-optimal parallel and sequential QR and LU factorizations, LAPACK Working Note 204, August 2008.
[4] E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst, S. Thibault, S. Tomov, Faster, cheaper, better – a hybridization methodology to develop linear algebra software for GPUs, Technical Report 230, LAPACK Working Note, September 2010.
[5] G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, H. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemarinier, H. Ltaief, P. Luszczek, A. YarKhan, J. Dongarra, Distributed-memory task execution and dependence tracking within DAGuE and the DPLASMA project, Technical Report 232, LAPACK Working Note, September 2010.
[6] Accuracy and Stability of Numerical Algorithms, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002
[7] Iterative Methods for Sparse Linear Systems, SIAM, Philadelphia, 2003
[8] Methods of conjugate gradients for solving linear system, J. Res. Nat. Bur. Stds. B, Volume 49 (1952), pp. 409-436
[9] GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comp., Volume 7 (1986), pp. 856-869
[10] Multigrid Methods and Applications, Springer-Verlag, Berlin, 1985
[11] Parallel Processing for Scientific Computing, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2006
[12] Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia, PA, 1994
[13] Numerical stability of the GMRES method, BIT, Volume 35 (1995), pp. 309-330
[14] Iterative Methods for Solving Linear Systems, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997
[15] Domain Decomposition, Parallel Multilevel Methods for Elliptic Partial Differential Equations, Cambridge University Press, New York, 1996
[16] Domain Decomposition Methods for Partial Differential Equations, Numerical Mathematics and Scientific Computation, Oxford Science Publications, Oxford, 1999
[17] Domain Decomposition Methods for the Numerical Solution of Partial Differential Equations, Springer Lecture Notes in Computational Science and Engineering, Springer, 2008
[18] G. Karypis, V. Kumar, MeTiS – Unstructured graph partitioning and sparse matrix ordering system, version 2.0, University of Minnesota, June 1995.
[19] PT-SCOTCH: a tool for efficient parallel graph ordering, Parallel Comput., Volume 34 (2008) no. 6–8
[20] Local preconditioners for two-level non-overlapping domain decomposition methods, Numer. Linear Algebra Appl., Volume 8 (2001) no. 4, pp. 207-227
[21] Parallel scalability study of hybrid preconditioners in three dimensions, Parallel Comput., Volume 34 (2008), pp. 363-379
[22] A. Haidar, On the parallel scalability of hybrid solvers for large 3D problems, PhD dissertation, INPT, June 2008. TH/PA/08/57.
[23] Overlapping domain decomposition algorithms for general sparse matrices, Numer. Linear Algebra Appl., Volume 3 (1996), pp. 221-237
[24] Parallel conjugate gradient-like algorithms for solving nonsymmetric linear systems on a vector multiprocessor, Parallel Comput., Volume 11 (1989), pp. 223-239
[25] Variational formulation and algorithm for trace operator in domain decomposition calculations (Tony Chan; Roland Glowinski; Jacques Périaux; Olof Widlund, eds.), Domain Decomposition Methods, SIAM, Philadelphia, PA, 1989, pp. 3-16
[26] Analysis and test of a local domain decomposition preconditioner (Roland Glowinski; Yuri Kuznetsov; Gérard Meurant; Jacques Périaux; Olof Widlund, eds.), Fourth International Symposium on Domain Decomposition Methods for Partial Differential Equations, SIAM, Philadelphia, PA, 1991, pp. 112-128
[27] Iterative methods for the solution of elliptic problems on regions partitioned into substructures, SIAM J. Numer. Anal., Volume 23 (1986) no. 6, pp. 1093-1120
[28] PaStiX: A high-performance parallel direct solver for sparse symmetric definite systems, Parallel Comput., Volume 28 ( January 2002 ) no. 2, pp. 301-321
[29] An asynchronous parallel supernodal algorithm for sparse Gaussian elimination, SIAM J. Matrix Anal. Appl., Volume 20 (1999) no. 4, pp. 915-952
[30] PLASMA users' guide, parallel linear algebra software for multicore architectures, November 2009 http://icl.cs.utk.edu/plasma (version 2.0)
[31] MAGMA, version 0.2, user guide, November 2009 http://icl.cs.utk.edu/magma
[32] J. Kurzak, J. Dongarra, Implementation of the mixed-precision high performance LINPACK benchmark on the CELL processor, Technical Report LAPACK Working Note #177 UT-CS-06-580, University of Tennessee Computer Science, September 2006.
[33] J. Langou, J. Langou, P. Luszczek, J. Kurzak, A. Buttari, J. Dongarra, Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy, Technical Report LAPACK Working Note #175 UT-CS-06-574, University of Tennessee Computer Science, April 2006.
Cited by Sources:
Comments - Policy