The future of high-performance computing (HPC) lies with large distributed parallel systems with three levels of parallelism, thousands of nodes containing MIMD* groups of SIMD* processors. For the past 50 years, the clock cycle of single processors has decreased steadily and the cost of each processor has decreased. Most applications would obtain speed increases inversely proportional to the decrease in the clock cycle and cost reduction by running on new hardware every few years with little or no additional programming effort. However, that era is over. Users now must utilize parallel algorithms in their application codes to reap the benefit of new hardware advances.
In the near term, the HPC community will see an evolution in the architecture of its basic building block processors. So, while clock cycles are not decreasing in these newer chips, we see that Moore’s law still holds with denser circuitry and multiple processors on the die. The net result is more floating point operations/second (FLOPS) being produced by wider functional units on each core and multiple cores on the chip. Additionally, a new source of energy-efficient performance has entered the HPC community. The traditional graphics processing units are becoming more reliable and more powerful for high-precision operations, making them viable to be used for HPC applications.
High Performance Computing: Programming and Applications presents techniques that address new performance issues in the programming of high performance computing (HPC) applications. Omitting tedious details, the book discusses hardware architecture concepts and programming techniques that are the most pertinent to application developers for achieving high performance. Even though the text concentrates on C and Fortran, the techniques described can be applied to other languages, such as C++ and Java.
Drawing on their experience with chips from AMD and systems, interconnects, and software from Cray Inc., the authors explore the problems that create bottlenecks in attaining good performance. They cover techniques that pertain to each of the three levels of parallelism:
-
Message passing between the nodes
-
Shared memory parallelism on the nodes or the multiple instruction, multiple data (MIMD) units on the accelerator
-
Vectorization on the inner level
After discussing architectural and software challenges, the book outlines a strategy for porting and optimizing an existing application to a large massively parallel processor (MPP) system. With a look toward the future, it also introduces the use of general purpose graphics processing units (GPGPUs) for carrying out HPC computations. A companion website at www.hybridmulticoreoptimization.com contains all the examples from the book, along with updated timing results on the latest released processors.