MixPert: Optimizing Mixed-Precision Floating-Point Emulation on GPU Integer Tensor Cores (LCTES 2024 - Languages, Compilers, Tools and Theory of Embedded Systems)

Mon 24 - Fri 28 June 2024 Copenhagen, Denmark

Who

Zejia Lin, Aoyuan Sun, Xianwei Zhang, Yutong Lu

Track

LCTES 2024

Time Zone

The program is currently displayed in (GMT+02:00) Windhoek.

Use conference time zone: (GMT+02:00) WindhoekSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 24 Jun 2024 11:35 - 11:50 at Iceland - Optimization Chair(s): Aviral Shrivastava

Abstract

Featuring mixed-precision tensor operations, accelerators significantly enhance performance for many error-tolerant computing tasks, but their applicability is limited in scenarios demanding high precision. While emulating higher-precision data types from lower-precision ones can bridge this gap, existing techniques either struggle to achieve sufficient accuracy or incur excessive overhead, negating performance gains. To mitigate the issue, we propose MixPert, a novel system that balances performance and accuracy via single-precision emulation on GPU Integer Tensor Cores. MixPert devises an efficient data layout and optimizes for the computation pipeline on Tensor Cores. By analyzing performance-precision trade-offs in-depth, MixPert provides users with multiple configurations based on accuracy requirements. Furthermore, MixPert is seamlessly integrated with compilers, enabling automatic adaptation and tuning of mixed-precision parameters. Evaluations on real-world scientific computing and deep learning applications demonstrate that MixPert achieves an average speedup of 1.72× compared to cuBLAS on general-purpose cores. Beyond maintaining improved accuracy, MixPert outperforms state-of-the-art approaches such as APE and CUTLASS by 1.22× and 1.21×, respectively.

Zejia Lin

Sun Yat-sen University

Aoyuan Sun

Sun Yat-sen University

Xianwei Zhang

Sun Yat-sen University

Yutong Lu

Sun Yat-sen University

Time Zone

The program is currently displayed in (GMT+02:00) Windhoek.

Use conference time zone: (GMT+02:00) WindhoekSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 24 Jun
Displayed time zone: Windhoek change

10:40 - 12:20	OptimizationLCTES at Iceland Chair(s): Aviral Shrivastava Arizona State University

10:40 15m Talk		Accelerating Shared Library Execution in a DBT LCTES Tom Spink University of St Andrews, Björn Franke University of Edinburgh
10:55 15m Talk		Efficient Implementation of Neural Networks Usual Layers on Fixed-Point Architectures LCTES Dorra Ben Khalifa University of Toulouse - ENAC, Matthieu Martel Université de Perpignan Via Domitia
11:10 15m Talk		TinySeg: Model Optimizing Framework for Image Segmentation on Tiny Embedded Systems LCTES Byungchul Chae Kyung Hee University, Jiae Kim Kyung Hee University, Seonyeong Heo Kyung Hee University
11:25 10m Break		Break - 10 minutes LCTES
11:35 15m Talk		MixPert: Optimizing Mixed-Precision Floating-Point Emulation on GPU Integer Tensor Cores LCTES Zejia Lin Sun Yat-sen University, Aoyuan Sun Sun Yat-sen University, Xianwei Zhang Sun Yat-sen University, Yutong Lu Sun Yat-sen University
11:50 15m Talk		Optimistic and Scalable Global Function Merging LCTES Kyungwoo Lee Meta, Manman Ren Meta, Ellis Hoag Meta
12:05 15m Talk		(Invited paper) Language-Based Deployment Optimization for Random Forest LCTES Jannik Malcher TU Dortmund University, Daniel Biebert TU Dortmund University, Kuan-Hsun Chen University of Twente, Sebastian Buschjäger TU Dortmund University, Christian Hakert TU Dortmund University, Jian-Jia Chen TU Dortmund University