Zero-Overhead Parallel Scans for Multi-Core CPUs
Scans are a fundamental primitive for array languages, and enable efficient parallel implementations of many problems. Parallel scans for multi-core CPUs come at a cost, however: their single-threaded performance is around 50% to 75% of the speed of a sequential scan. We present our \textit{assisted reduce-then-scan}, an adaptation of reduce-then-scan with no single-threaded overhead and slightly better multi-threaded performance than reduce-then-scan. Furthermore we show that chained scans, the state-of-the-art scan algorithm for GPUs, are also suitable for CPUs, outperforming (assisted) reduce-then-scan. Our \textit{adaptive chained scan} has zero single-threaded overhead, and equal multi-threaded performance to the standard chained scan. Our algorithms allow more threads to join the workload of a scan during its execution, which may happen unpredictably if the program exploits both data and task parallelism. This robustness is especially important for the implementation of parallel array languages, which should work well on a wide range of programs and hardware.
Extended Abstract (array24-paper15.pdf) | 357KiB |
Tue 25 JunDisplayed time zone: Windhoek change
13:40 - 15:20 | |||
13:40 25mTalk | Apple Array Allocation ARRAY Vanessa McHale Northern Trust File Attached | ||
14:05 25mTalk | Shray: an Owner-Compute Distributed Shared-Memory System ARRAY Stefan Schrijvers Radboud University, Thomas Koopman Radboud University, Sven-Bodo Scholz Radboud University DOI | ||
14:30 25mTalk | Work Assisting: Linking Task-Parallel Work Stealing with Data-Parallel Self Scheduling ARRAY DOI | ||
14:55 25mTalk | Zero-Overhead Parallel Scans for Multi-Core CPUs ARRAY Ivo Gabe de Wolff Utrecht University, David van Balen , Gabriele Keller Utrecht University, Trevor L. McDonell Utrecht University File Attached |