|
For most of computing history, we benefited from exponential increases in performance
of scalar processors. That has come to an end. We are now at the dawn of
the heterogeneous parallel computing era. With all applications being power-sensitive
and all computing systems being power-limited, from mobile to cloud, future computing
platforms must embrace heterogeneity. For example, a fast-growing portion of the
top supercomputers in the world have become heterogeneous CPU + GPU computing
clusters. While the first-generation programming interfaces such as CUDA and OpenCL
have enabled development of new libraries and applications for these systems, there
has been a clear need for much higher productivity in heterogeneous parallel software
development.
The major challenge is that any programming interface that raises productivity in
this domain must also give programmers enough control to reach their performance
goals. C++ AMP from Microsoft is a major step forward in addressing this challenge.
The C++ AMP interface is a simple, elegant extension to the C++ language to address
two major weaknesses of previous interfaces. First, the previous approaches did not
fit well with the C++ software engineering practice. The kernel-based parallel programming
models tend to disturb the class organization of applications. Second, their
C-based indexing for dynamically allocated arrays complicates the code for managing
locality.
I am excited to see that C++ AMP supports the use of C++ loop constructs and
objected-oriented features in parallel code to address the first issue and an array_view
construct to address the second issue. The array_view approach is forward-looking and
prepares applications to take full advantage of the upcoming unified address space
architectures. Many experienced CUDA and OpenCL programmers have found the
C++ AMP programming style refreshing, elegant, and effective.
Equally importantly, in my opinion, the C++ AMP interface opens the door for a
wide range of innovative compiler transformations, such as data layout adjustment and
thread granularity adjustment, to become mainstream. It also enables run-time implementation
optimizations on data movement. Such advancements will be needed for a
dramatic improvement in programmer productivity.
While C++ AMP is currently only implemented on Windows, the interface is open
and will likely be implemented on other platforms. There is great potential for the
C++ AMP interface to make an even bigger impact if and when the other platform
vendors begin to offer their implementation of the interface. |