Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation (LCTES 2024 - Languages, Compilers, Tools and Theory of Embedded Systems)

Who

Shangqing Liu, Wei Ma, Jian Wang, Xiaofei Xie, Ruitao Feng, Yang Liu

Track

LCTES 2024

Time Zone

The program is currently displayed in (GMT+02:00) Windhoek.

Use conference time zone: (GMT+02:00) WindhoekSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 24 Jun 2024 16:45 - 17:00 at Iceland - Analysis and Testing Chair(s): Jason Xue

Abstract

Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks. Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task i.e., determining whether it is vulnerable or not. This poses a challenge for a single deep-learning based model to effectively learn the wide array of vulnerability characteristics. Furthermore, due to the challenges associated with collecting large-scale vulnerability data, these detectors often overfit to limited training datasets, resulting in lower model generalization performance.

To address the aforementioned challenges, in this work, we introduce a fine-grained vulnerability detector namely FGVulDet. Unlike previous approaches, FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability. Each classifier is designed to learn type-specific vulnerability semantics. Additionally, to address the scarcity of data for some vulnerability types and enhance data diversity for learning better vulnerability semantics, we propose a novel vulnerability-preserving data augmentation technique to augment the number of vulnerabilities. Taking inspiration from recent advancements in graph neural networks for learning program semantics, we incorporate a Gated Graph Neural Network (GGNN) and extend it to an edge-aware GGNN to capture edge-type information. FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities. Extensive experiments compared with static-analysis-based approaches and learning-based approaches have demonstrated the effectiveness of FGVulDet.

Shangqing Liu

Nanyang Technological University

Wei Ma

Nanyang Technological University, Singapore

Jian Wang

Nanyang Technological University

China

Xiaofei Xie

Singapore Management University

Singapore

Ruitao Feng

SMU

Singapore

Yang Liu

Nanyang Technological University

Singapore

Time Zone

The program is currently displayed in (GMT+02:00) Windhoek.

Use conference time zone: (GMT+02:00) WindhoekSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 24 Jun
Displayed time zone: Windhoek change

16:00 - 17:40	Analysis and TestingLCTES at Iceland Chair(s): Jason Xue MBZUAI

16:00 15m Talk		EVMBT: A Binary Translation Scheme for Upgrading EVM Smart Contracts to WASM LCTES Weimin Chen The Hong Kong Polytechnic University, Xiapu Luo The Hong Kong Polytechnic University, Haoyu Wang Huazhong University of Science and Technology, Heming Cui University of Hong Kong, Shuyu Zheng Peking University, Xuanzhe Liu Peking University
16:15 15m Talk		CodeExtract: Enhancing Binary Code Similarity Detection with Code Extraction Techniques LCTES Lichen Jia Institute of Computing Technology, Chinese Academy of Sciences, Chenggang Wu Institute of Computing Technology at Chinese Academy of Sciences; University of Chinese Academy of Sciences; Zhongguancun Laboratory, Zhe Wang Institute of Computing Technology at Chinese Academy of Sciences; Zhongguancun Laboratory, Peihua Zhang
16:30 15m Talk		Foundations for a Rust-Like Borrow Checker for C LCTES Tiago Silva University of Porto, João Bispo Faculdade de Engenharia e Universidade do Porto, Tiago Carvalho University of Porto
16:45 15m Talk		Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation LCTES Shangqing Liu Nanyang Technological University, Wei Ma Nanyang Technological University, Singapore, Jian Wang Nanyang Technological University, Xiaofei Xie Singapore Management University, Ruitao Feng SMU, Yang Liu Nanyang Technological University
17:00 15m Talk		(WIP) A Flexible-Granularity Task Graph Representation and its Generation from C Applications LCTES Tiago Santos Faculty of Engineering, University of Porto, João Bispo Faculdade de Engenharia e Universidade do Porto, João M. P. Cardoso University of Porto and INESC TEC, Portugal
17:15 25m Day closing		Award and Closing LCTES