Date: November 16-20, 2015
Location: Conference Room Ⅱ, CSRC Home Building
Objective: To familiarize CSRC researchers with the state of art linear algebra algorithms for high performance scientific computing
Lecturer: Prof.Chao Yang, Senior Research Scientist, Computational Research Division, Lawrence Berkeley National Lab
Date | Schedule | |
Nov16-Nov20,2015 | Section1: Lecture | 2:00pm-3:00pm |
Section2: Free Discussion | 3:00pm-4:00pm | |
Section3: Lecture | 4:00pm-5:00pm |
Day 1 : Basics on modern computer architecture and high performance computing, performance models, profiling tools and general performance tuning techniques
1) Processing units:
Vector units
Parallel processing:
Instruction level parallelism
Task parallel vs. data parallel
Thread level of parallelism (shared memory)
Distributed memory parallelism
Memory:
Latency vs bandwidth
Hierarchy
Cache coherence
Interconnect:
Theoretical latency vs bandwidth
Node/core topology and effective bandwidth
2) Performance profiling and optimization
Profiling tools: hardware counter, tracing, TAU, PAPI, IPM
Optimization techniques
Loop fusion/unrolling, blocking, branch elimination
Cache miss reduction, overlap communication with floating point
Operations, hybrid OpenMP/MPI, load balance, synchronization
Day 2 : BLAS, parallelization, scalability
1) BLAS
2) Linear equation (Gauss elimination with partial pivoting)
Block algorithms
Parallel algorithms for shared memory machines, dynamic scheduling
Parallel algorithms for distributed memory systems
Day 3 : Dense linear algebra (linear systems, least squares and eigenvalue problems
1) Least squares (QR factorization and SVD)
2) Eigenvalue problems
3) Tools: BLAS, LAPACK, ScaLAPACK, Elemental, ELPA
Day 4 : Methods for solving sparse linear system of equations (both direct method and iterative method)
1) Sparse matrix and storage format
2) Sparse matrix vector multiplication
3) Sparse direct methods for solving linear systems
Matrix ordering
Symbolic factorization, elimination tree
Left-looking, right-looking, multifrontal
Shared memory parallel implementation
Distributed memory parallel implementation
Tool: MUMPS, PARDISOL, SuperLU, Metis
4) Iterative methods
Linear equations: Jacobi, Gauss-Siedel, Krylov subspace, conjugate gradient, GMRES etc.
Day 5 : Methods for solving sparse eigenvalue problem and Application
1) Eigenvalue problem: Lanczos, Arnoldi, optimization based approaches
2) Application
Electronic structure
Inverse problem
Data analysis
Bio-skeptch of Lecture:
PPT download