Impressive Results Once Again
Recently, the ICSE 2024 conference, a Class-A international academic conference recommended by the China Computer Federation (CCF), was held in Lisbon, Portugal. The Software Engineering Team from Zhejiang University published a total of 11 papers, with Zhejiang University being the first author affiliation for eight of them. One paper received the ACM SIGSOFT Distinguished Paper Award. Additionally, another paper was presented at MSR 2024, winning both the MSR ACM SIGSOFT Distinguished Paper Award and the MSR 2024 FOSS Impact Award.
ICSE, the International Conference on Software Engineering, is a globally recognized flagship conference in the field of software engineering, held annually. The 46th ICSE will take place in 2024. The Software Engineering Team at Zhejiang University has once again made remarkable progress in software engineering research!
Second from the left: Mr. Hu Xing from Zhejiang University, third from the left: Mr. Chen Junkai from Zhejiang University
Second from the right: Mr. Hu Xing from Zhejiang University
Third from the left: Mr. Hu Xing from Zhejiang University
List of Papers (in alphabetical order by the first author’s name)
Paper introduction
01: Code Search is All You Need? Improving Code Suggestions with Code Search ★ (ACM SIGSOFT Distinguished Paper Award) ★
Authors: Chen Junkai, Hu Xing, LI Zhenhao, GAO Cuiyun, XIA Xin, David Lo
Abstract: Modern development environments provide automated code suggestion tools to help developers work more efficiently. These tools often retrieve similar code snippets from repositories or use deep learning models to provide recommendations. However, the systematic use of code search to enhance code suggestions has not been extensively explored. This paper investigates a search-based code recommendation framework. Our framework employs various retrieval methods and search strategies to locate similar code, thereby improving the recommendation performance of language models. Experiments on different language models showed that this framework significantly enhances code suggestion performance.
02: Exploiting Library Vulnerability via Migration Based Automating Test Generation
Authors: Chen Zirui, Hu Xing, Xia Xin, GAO Yi, XU Tongtong, David Lo, Yang Xiaohu
Abstract: In software development, third-party libraries are widely used to avoid reimplementing existing functionalities. When a new library vulnerability is disclosed, project maintainers need to determine if their project is affected, requiring extensive evaluations. Existing tools face issues such as generating false positives and low success rates in complex scenarios. This study introduces VESTA, a new exploitation-based approach that provides tests for developers to decide whether to update dependencies. We experimented on 30 vulnerabilities disclosed over the past five years and found that VESTA had a success rate of 71.7%, a 53.4% improvement over existing methods.
03: MUT: Human-in-the-Loop Unit Test Migration
Authors: GAO Yi, Hu Xing, XU Tongtong, Xia Xin, David Lo, Yang Xiaohu
*Abstract:* While test migration has proven effective for mobile application testing, unit test migration at the source code level remains underexplored, particularly for C++ programs. This study introduces MUT, a test generation method based on migration, which facilitates test reuse across languages and platforms. MUT maps code between source and target projects, selects suitable unit tests for migration, and translates the tests to be compatible with the target project. A web tool was developed to assist developers with this migration process.
04: Pre-training by Predicting Program Dependencies for Vulnerability Analysis Tasks
Authors: LIU Zhongxin, TANG Zhijie, ZHANG Junwei, XIA Xin, Yang Xiaohu
Abstract:Understanding code semantics, such as program dependencies, is critical for tasks like vulnerability analysis. Existing pre-trained models often overlook or inadequately leverage code semantics. To address this, we propose two novel pre-training tasks—control-dependent prediction at the sentence level and data-dependent prediction at the word level—to help models learn code semantics more effectively. Using these tasks, we developed PDBERT, a pre-trained code model that enhances downstream tasks such as vulnerability detection and classification through fine-tuning.
05: Towards More Practical Automation of Vulnerability Assessment
Authors: PAN Shengyi, BAO Lingfeng, ZHOU Jiayuan, Hu Xing, XIA Xin, LI Shanping
Abstract: Assessing vulnerability severity is a crucial step in managing software vulnerabilities. Our research identifies potential relationships between CVSS indicators and develops a cue-based learning model to predict these indicator combinations. We also propose two new metrics to better evaluate model performance in prioritizing vulnerabilities. Experimental results show that our approach effectively addresses the limitations of existing methods.
06: PPT4J: Patch Presence Test for Java Binaries
Authors: Pan Zhiyuan, Hu Xing, Xia Xin, Zhan Pheasant, David Lo, Yang Xiaohu
Abstract:Security patches are essential to protecting software from vulnerabilities. However, determining whether a patch has been integrated, particularly when only binaries are available, can be challenging. This paper presents PPT4J, a Java patch existence testing framework that extracts semantic information from patches and uses feature-based techniques to identify patch lines in binaries. A practical evaluation on JetBrains IntelliJ IDEA revealed an unpatched third-party library, which was reported to the vendor.
07: Streamlining Java Programming: Uncovering Well-Formed Idioms with IdioMine
Authors: Yang Yanming, Hu Xing, Xia Xin, David Lo, Yang Xiaohu
Abstract: Identifying common code idioms is challenging but crucial for improving code quality and maintainability. We propose IdiomMine, a new approach that automatically extracts common idioms from Java projects and libraries. Our experiments and user studies confirm the correctness and practical value of IdiomMine.
08: PS3: Precise Patch Presence Test based on Semantic Symbolic Signature
Authors: Zhan Qi, Hu Xing, LI Zhiyang, Xia Xin, David Lo, LI Shanping
Abstract: Testing for security patches in large software systems is critical for ensuring security. Existing methods are often limited by compiler options. We present PS3, a new method that uses symbolic simulation to extract stable signatures across different compiler options, allowing for precise patch presence testing.
About the Laboratory
The Software Engineering Team at Zhejiang University has achieved world-class research results in software analysis, software repository mining, empirical software engineering, and AI testing and analysis. The team has collaborated with renowned institutions such as Singapore Management University, the Singapore University of Technology and Design, Monash University, and the University of British Columbia. They have also formed industry-academic partnerships with companies like Huawei, State Street Bank, and the Shanghai Pudong Development Bank, making a significant impact on both China's software industry and the global open-source community.