Don’t Write, but Return: Replacing Output Parameters with Algebraic Data Types in C-to-Rust Translation (PLDI 2024 - PLDI Research Papers)

Who

Jaemin Hong, Sukyoung Ryu

Track

PLDI 2024 PLDI Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Windhoek.

Use conference time zone: (GMT+02:00) WindhoekSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 26 Jun 2024 12:00 - 12:20 at Sweden - Parsing and Compiling and Transforming Chair(s): Stephen Kell

Abstract

Translating legacy system programs from C to Rust is a promising way to enhance their reliability. To alleviate the burden of manual translation, automatic C-to-Rust translation is desirable. However, existing translators fail to generate Rust code fully utilizing Rust’s language features, including algebraic data types. In this work, we focus on tuples and Option/Result types, an important subset of algebraic data types. They are used as functions’ return types to represent those returning multiple values and those that may fail. Due to the absence of these types, C programs use output parameters, i.e., pointer-type parameters for producing outputs, to implement such functions. As output parameters make code less readable and more error-prone, their use is discouraged in Rust. To address this problem, this paper presents a technique for removing output parameters during C-to-Rust translation. This involves three steps: (1) syntactically translating C code to Rust using an existing translator; (2) analyzing the Rust code to extract information related to output parameters; and (3) transforming the Rust code using the analysis result. The second step poses several challenges, including the identification and classification of output parameters. To overcome these challenges, we propose a static analysis based on abstract interpretation, complemented by the notion of abstract read/write sets, which approximate the sets of read/written pointers, and two sensitivities: write set sensitivity and nullity sensitivity. Our evaluation shows that the proposed technique is (1) scalable, with the analysis and transformation of 190k LOC within 213 seconds, (2) useful, with the detection of 1,670 output parameters across 55 real-world C programs, and (3) mostly correct, with 25 out of 26 programs passing their test suites after the transformation.

DOI

https://doi.org/10.1145/3656406

Jaemin Hong

KAIST

South Korea

Sukyoung Ryu