PLDI 2024
Mon 24 - Fri 28 June 2024 Copenhagen, Denmark
Tue 25 Jun 2024 14:55 - 15:20 at Stockholm - Performance

Scans are a fundamental primitive for array languages, and enable efficient parallel implementations of many problems. Parallel scans for multi-core CPUs come at a cost, however: their single-threaded performance is around 50% to 75% of the speed of a sequential scan. We present our \textit{assisted reduce-then-scan}, an adaptation of reduce-then-scan with no single-threaded overhead and slightly better multi-threaded performance than reduce-then-scan. Furthermore we show that chained scans, the state-of-the-art scan algorithm for GPUs, are also suitable for CPUs, outperforming (assisted) reduce-then-scan. Our \textit{adaptive chained scan} has zero single-threaded overhead, and equal multi-threaded performance to the standard chained scan. Our algorithms allow more threads to join the workload of a scan during its execution, which may happen unpredictably if the program exploits both data and task parallelism. This robustness is especially important for the implementation of parallel array languages, which should work well on a wide range of programs and hardware.

Extended Abstract (array24-paper15.pdf)357KiB

Tue 25 Jun

Displayed time zone: Windhoek change