[TOPLAS] (De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional Homomorphism (PLDI 2024 - PLDI Research Papers)

Track

PLDI 2024 PLDI Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Windhoek.

Use conference time zone: (GMT+02:00) WindhoekSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 26 Jun 2024 16:20 - 16:40 at Sweden - Fast Linear Algebra Chair(s): Zachary Tatlock

Abstract

Data-parallel computations, such as linear algebra routines (BLAS) and stencil computations, constitute one of the most relevant classes in parallel computing, e.g., due to their importance for deep learning. Efficiently de-composing such computations for the memory and core hierarchies of modern architectures and re-composing the computed intermediate results back to the final result – we say (de/re)-composition for short – is key to achieve high performance for these computations on, e.g., GPU and CPU. Current high-level approaches to generating data-parallel code are often restricted to a particular subclass of data-parallel computations and architectures (e.g., only linear algebra routines on only GPU, or only stencil computations), and/or the approaches rely on a user-guided optimization process for a well-performing (de/re)-composition of computations, which is complex and error prone for the user.

We formally introduce a systematic (de/re)-composition approach, based on the algebraic formalism of Multi-Dimensional Homomorphisms (MDHs) (https://mdh-lang.org). Our approach is designed as general enough to be applicable to a wide range of data-parallel computations and for various kinds of target parallel architectures. To efficiently target the deep and complex memory and core hierarchies of contemporary architectures, we exploit our introduced (de/re)-composition approach for a correct-by-construction, parametrized cache blocking and parallelization strategy. We show that our approach is powerful enough to express, in the same formalism, the (de/re)-composition strategies of different classes of state-of-the-art approaches (scheduling-based, polyhedral, etc), and we demonstrate that the parameters of our strategies enable systematically generating code that can be fully automatically optimized (auto-tuned) for the particular target architecture and characteristics of the input and output data (e.g., their sizes and memory layouts). Particularly, our experiments confirm that via auto-tuning, we achieve higher performance than state-of-the-art approaches, including hand-optimized solutions provided by vendors (such as NVIDIA cuBLAS/cuDNN and Intel oneMKL/oneDNN), on real-world data sets and for a variety of data-parallel computations, including: linear algebra routines, stencil and quantum chemistry computations, data mining algorithms, and computations that recently gained high attention due to their relevance for deep learning.

Link to Publication

https://dl.acm.org/doi/10.1145/3665643

Link to Preprint

https://dl.acm.org/doi/pdf/10.1145/3665643

DOI

https://doi.org/10.1145/3665643

Full version of paper

MDH Website

Time Zone

The program is currently displayed in (GMT+02:00) Windhoek.

Use conference time zone: (GMT+02:00) WindhoekSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 26 Jun
Displayed time zone: Windhoek change

16:00 - 17:20	Fast Linear AlgebraPLDI Research Papers at Sweden Chair(s): Zachary Tatlock University of Washington

16:00 20m Talk		A Verified Compiler for a Functional Tensor Language PLDI Research Papers Amanda Liu Massachusetts Institute of Technology, Gilbert Bernstein University of Washington, Seattle, Adam Chlipala Massachusetts Institute of Technology, Jonathan Ragan-Kelley Massachusetts Institute of Technology DOI
16:20 20m Talk		[TOPLAS] (De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional Homomorphism PLDI Research Papers Ari Rasch University of Muenster Link to publication DOI Pre-print Media Attached
16:40 20m Talk		Compilation of Modular and General Sparse Workspaces PLDI Research Papers Genghan Zhang Stanford University, Olivia Hsu Stanford University, Fredrik Kjolstad Stanford University DOI
17:00 20m Talk		Descend: A Safe GPU Systems Programming Language PLDI Research Papers Bastian Köpcke University of Münster, Sergei Gorlatch University of Muenster, Michel Steuwer Technische Universität Berlin DOI Pre-print

[TOPLAS] (De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional Homomorphism

Wed 26 Jun
Displayed time zone: Windhoek change

Ari Rasch

University of Muenster

Tracks

Co-hosted Conferences

Workshops

[TOPLAS] (De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional Homomorphism

Program Display Configuration

Program Display Configuration

Wed 26 JunDisplayed time zone: Windhoek change

Ari Rasch

University of Muenster

Wed 26 Jun
Displayed time zone: Windhoek change