CodeExtract: Enhancing Binary Code Similarity Detection with Code Extraction Techniques
In the field of Binary Code Similarity Detection (BCSD), when dealing with functions in binary form, the conventional approach is to identify a set of functions that are most similar to the target function. These similar functions often originate from the same source code but may differ due to variations in compilation settings. Such analysis is crucial for applications in the security domain, involving vulnerability discovery, malware identification, and more. Function inlining, an optimization technique employed by compilers, embeds the code of called functions directly into the calling function. Due to different compilation options (such as O1 and O3) leading to varying levels of function inlining, this results in significant discrepancies between binary functions derived from the same source code under different compilation settings, posing challenges to the accuracy of state-of-the-art (SOTA) learning-based binary code similarity detection (LB-BCSD) methods. In contrast to function inlining, code extraction technology can identify and separate duplicate code within a program, replacing it with corresponding function calls. To overcome the impact of function inlining, this paper introduces a novel approach, CodeExtract. This method initially utilizes code extraction techniques to transform code introduced by function inlining back into function calls. Subsequently, it actively inlines functions that cannot undergo code extraction, effectively eliminating the differences introduced by function inlining. Experimental validation shows that CodeExtract enhances the performance of LB-BCSD models by 20% in addressing the challenges posed by function inlining.
Mon 24 JunDisplayed time zone: Windhoek change
16:00 - 17:40 | |||
16:00 15mTalk | EVMBT: A Binary Translation Scheme for Upgrading EVM Smart Contracts to WASM LCTES Weimin Chen The Hong Kong Polytechnic University, Xiapu Luo The Hong Kong Polytechnic University, Haoyu Wang Huazhong University of Science and Technology, Heming Cui University of Hong Kong, Shuyu Zheng Peking University, Xuanzhe Liu Peking University | ||
16:15 15mTalk | CodeExtract: Enhancing Binary Code Similarity Detection with Code Extraction Techniques LCTES Lichen Jia Institute of Computing Technology, Chinese Academy of Sciences, Chenggang Wu Institute of Computing Technology at Chinese Academy of Sciences; University of Chinese Academy of Sciences; Zhongguancun Laboratory, Zhe Wang Institute of Computing Technology at Chinese Academy of Sciences; Zhongguancun Laboratory, Peihua Zhang | ||
16:30 15mTalk | Foundations for a Rust-Like Borrow Checker for C LCTES Tiago Silva University of Porto, João Bispo Faculdade de Engenharia e Universidade do Porto, Tiago Carvalho University of Porto | ||
16:45 15mTalk | Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation LCTES Shangqing Liu Nanyang Technological University, Wei Ma Nanyang Technological University, Singapore, Jian Wang Nanyang Technological University, Xiaofei Xie Singapore Management University, Ruitao Feng SMU, Yang Liu Nanyang Technological University | ||
17:00 15mTalk | (WIP) A Flexible-Granularity Task Graph Representation and its Generation from C Applications LCTES Tiago Santos Faculty of Engineering, University of Porto, João Bispo Faculdade de Engenharia e Universidade do Porto, João M. P. Cardoso University of Porto and INESC TEC, Portugal | ||
17:15 25mDay closing | Award and Closing LCTES |