Preferred Name
Justin Carpenter
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
ORCID
https://orcid.org/0009-0000-3775-5624
Date of Graduation
5-9-2024
Semester of Graduation
Spring
Degree Name
Master of Science (MS)
Department
Department of Computer Science
Second Advisor
Michael Lam
Third Advisor
Brett Tjaden
Abstract
Binary stylometry aims to find the features in a binary computer program and use them to identify the developers of the corresponding source code. Despite the noises in the code compilation process from the compiler, assembler, linker, and library functions, two existing studies based on machine learning for binary stylometry have reported high success rates (Alrabaee, Shirani, Wang, Debbabi, and Hanna 2018; Caliskan, Yamaguchi, Dauber, Harang, Rieck, Greenstadt, and Narayanan 2018). In this thesis, we first observe that both existing studies are based on a largely benign security model and assume that the binaries used in testing and prediction are generated in the same way as the training data. As a result, such binary stylometry studies would not work on binary mutants that are generated directly through binary instrumentation from an existing binary and resemble the original. Tracing such a mutant through existing binary stylometry studies will lead to the original binary program developer(s), thus defeating the very purpose of binary attribution in many applications. Next, we examine existing general-purpose static binary instrumentation techniques to transform existing binary programs into new meaningful mutants against binary stylometry. We find they are less ideal in the binary stylometry setting. To demonstrate the practicality of binary instrumentation attacks in binary stylometry, we then instrument, with minimal changes, the security-intensive 2013 NSA codebreaker challenge program to strengthen its security design. This successful effort gives us high confidence in the practicality of the attack.