The C++ Direction Group has set a future direction for C++ and includes recommendation for C++ in the short and medium term. It will have immediate impact on what will enter C++20, and beyond. First half of this talk will devote to the issue of 3Ps: Performance, Portability and Productivity. The other parts will consider the C++ Directions Group’s description of where future C++ is heading as a member of the DG. It also includes a guidance towards Heterogeneous C++.
The SYCL standard from the Khronos Group is a strong candidate to implement this upcoming C++ standard as are many other C++ frameworks from DOE, and HPX for the distributed case. One of the core ideas of this standard is that everything must be standard C++, the only exception being that some feature of C++ cannot be used in places that can be executed on an OpenCL device, often due to hardware limitation.
Implementing Heterogeneous C++ is like battling the four Horsemen of the Apocalypse. These are:
- Data movement
- Data Locality
- Data Layout
- Data Affinity
The rest of this talk presents some of the challenges and solutions to implement a Heterogeneous C++ standard in clang based on our implementation of Khronos's SYCL language with Codeplay's ComputeCpp compiler and my previous experience with OpenMP/OpenACC, with the fast growth of C++ and clang being a platform of choice to prototype many of the new C++ features. We also demonstrate the ecosystem that Codeplay has built to support Machine Learning using SYCL.
Michael Wong is the Vice President of Research and Development at Codeplay Software, a Scottish company that produces compilers, debuggers, runtimes, testing systems, and other specialized tools to aid software development for heterogeneous systems, accelerators and special purpose processor architectures, including GPUs and DSPs. He is now a member of the open consortium group known as Khronos, MISRA, and AUTOSAR and is Chair of the Khronos C++ Heterogeneous Programming language SYCL, used for GPU dispatch in native modern C++ (14/17), OpenCL, as well as guiding the research and development teams of ComputeSuite, ComputeAorta/ComputeCPP. For twenty years, he was the Senior Technical Strategy Architect for IBM compilers.
He is the Canadian Head of Delegation to the ISO C++ Standard and a past CEO of OpenMP. He is also a founding member of the ISO C++ Directions group, and a Director and VP of ISOCPP.org, and Chair of all Programming Languages for Canada’s Standard Council. He also participates in ISO SC42 on AI and ML. He has so many titles, it’s a wonder he can get anything done. He chairs WG21 SG14 Games Development/Low Latency/Financial/Embedded Devices and WG21 SG19 Machine Learning, and is the co-author of a book on C++ and a number of C++/OpenMP/Transactional Memory features including generalized attributes, user-defined literals, inheriting constructors, weakly ordered memory models, and explicit conversion operators. Having been the past C++ team lead to IBM’s XL C++ compiler means he has been messing around with designing the C++ language and C++ compilers for twenty-five years. His current research interest, i.e. what he would like to do if he had time is in the area of parallel programming, future programming models for Neural network, AI, Machine vision, safety/critical/ programming vulnerabilities, self-driving cars and low-power devices, lock-free programming, transactional memory, C++ benchmark performance, object model, generic programming and template metaprogramming. He holds a B.Sc from University of Toronto, and a Masters in Mathematics from University of Waterloo.
Many think deep learning is mostly about dense computation. However, sparse linear algebra plays important roles in deep learning models used in Facebook, for example, from embedding tables for categorical features in recommendation models or from model pruning. These use cases pose challenges similar to scientific computing applications using sparse linear algebra such as demand for high memory capacity/bandwidth and high-bandwidth/low-latency communication. I am going to talk about the similarities and differences between DL and HPC, and optimizations we have successfully applied to our use cases such as reduced precision embedding, block structured pruning, and run-time code generation. I hope my talk will help share insight from both fields and encourage more collaboration.
Jongsoo Park is a technical lead and manager at Facebook AI Systems Co-design team. He was a recipient of SC best paper award on his low-communication FFT work, and a main contributor to the winning entries of HPCG benchmark for a few years. He received his PhD from Stanford University.