Towards reliable large language models, risks and solutions

Abstract

Large Language Models (LLMs) have emerged as powerful tools applied in different domains. However, the challenge of effectively auditing content generated by these models persists, accompanied by the emergence of novel security and privacy concerns. This academic talk aims to address two pivotal questions: (1) How can the origin of generated content be precisely traced throughout its lifecycle? (2) What are the distinctive security and privacy risks associated with deploying LLMs under decentralized settings? Our presentation will unveil a groundbreaking approach utilizing error-correction code for a robust multi-bit watermark method. Additionally, we will introduce a novel attack capable of accurately recovering input text from LLM intermediate activations transmitted during decentralized machine learning. These methodological advancements signify notable progress in uncovering vulnerabilities inherent in LLMs and enhancing their reliability in real-world applications.

Lecturer

Wenjie Qu is currently pursuing his Ph.D. in the Department of Computer Science at the National University of Singapore, under the supervision of Prof. Jiaheng Zhang. His research focuses at the intersection of AI security, applied cryptography, and systems. He has publications in esteemed conferences such as CCS, NDSS, NeurIPS, ISSTA, and DAC, spanning the domains of security, machine learning, and software engineering. In recognition of his academic excellence, Wenjie has been honored with the NUS President Graduate Fellowship and the National Scholarship. He received his bachelor's degree from Huazhong University of Science and Technology.

EVENTS

Related Stories

Towards reliable large language models, risks and solutions