Orchestrating Multiple Mixed Precision Models on a Shared Precision-Scalable NPU (LCTES 2024 - Languages, Compilers, Tools and Theory of Embedded Systems)

Mon 24 - Fri 28 June 2024 Copenhagen, Denmark

Who

Kiung Jung, Seok Namkoong, Hongjun Um, Hyejun Kim, Youngsok Kim, Yongjun Park

Track

LCTES 2024

Time Zone

The program is currently displayed in (GMT+02:00) Windhoek.

Use conference time zone: (GMT+02:00) WindhoekSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 24 Jun 2024 13:55 - 14:10 at Iceland - Embedded Systems Chair(s): Jian-Jia Chen

Abstract

Mixed-precision quantization can reduce the computational requirements of Deep Neural Network (DNN) models with minimal loss of accuracy. As executing mixed-precision DNN models on Neural Processing Units (NPUs) incurs significant under-utilization of computational resources, Precision-Scalable NPUs (PSNPUs) which can process multiple low-precision layers simultaneously have been proposed. However, the under-utilization still remains significant due to the lack of adequate scheduling algorithms to support multiple mixed-precision models on PSNPUs. Therefore, in this paper, we propose a dynamic programming-based scheduling algorithm for the operations of multiple mixed-precision models. Our scheduling algorithm finds the optimal execution plan that exploits the precision-scalable MACs to improve the end-to-end inference latency of mixed-precision models. We evaluate the performance of this algorithm in terms of hardware utilization, inference latency, and schedule search time compared to baseline scheduling algorithms. The experimental results show 1.23\texttimes{} inference latency improvements over the baseline algorithms within the allowed minutes.

Kiung Jung

Yonsei University

Seok Namkoong

Yonsei University

Hongjun Um

Hanyang University

Hyejun Kim

Yonsei University

Youngsok Kim

Yonsei University

South Korea

Yongjun Park

Yonsei University

Time Zone

The program is currently displayed in (GMT+02:00) Windhoek.

Use conference time zone: (GMT+02:00) WindhoekSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 24 Jun
Displayed time zone: Windhoek change

13:40 - 15:20	Embedded SystemsLCTES at Iceland Chair(s): Jian-Jia Chen TU Dortmund University

13:40 15m Talk		SmartVisor: User-Friendly Hypervisor for Mobile RobotsRemote LCTES Guanyu Chen Zhejiang University, Pan Lv Zhejiang University, Hong Li Zhejiang University, Guoqing Yang Zhejiang University
13:55 15m Talk		Orchestrating Multiple Mixed Precision Models on a Shared Precision-Scalable NPU LCTES Kiung Jung Yonsei University, Seok Namkoong Yonsei University, Hongjun Um Hanyang University, Hyejun Kim Yonsei University, Youngsok Kim Yonsei University, Yongjun Park Yonsei University
14:10 15m Talk		WoCA: Avoiding Intermittent Execution in Embedded Systems by Worst-Case Analyses with Device States LCTES Phillip Raffeck Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Johannes Maier Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Peter Wägemann Friedrich-Alexander University Erlangen-Nürnberg (FAU)
14:25 10m Break		Break - 10 minutes LCTES
14:35 15m Talk		Unmasking the Lurking: Malicious Behavior Detection for IoT Malware with Multi-label Classification LCTES Ruitao Feng SMU, Sen Li Tianjin University, Sen Chen Tianjin University, Mengmeng Ge Nanyang Technological University, Xuewei Li Tianjin University, Xiaohong Li Tianjin University
14:50 15m Talk		TWFuzz: Fuzzing Embedded Systems with Three WiresRemote LCTES Zhongwen Feng Chang' an University, Junyan Ma Chang'an University
15:05 15m Talk		OpenMP-RT: Native Pragma Support for Real-Time Tasks and Synchronization with LLVM under Linux LCTES Brayden McDonald North Carolina State University, Frank Mueller North Carolina State University, USA