Preferred Name
Alex Mitchell
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
ORCID
https://orcid.org/0000-0002-0590-0812
Date of Graduation
12-17-2022
Semester of Graduation
Fall
Degree Name
Master of Science (MS)
Department
Department of Computer Science
Abstract
Computer programmers often leave their individual programming styles in source code. Recent studies show that contrary to a popular belief, many of such programming styles can survive, in controlled environments, code compilation into binary. From the binary programming styles can be effectively retrieved for enhanced binary authorship attribution; such binary authorship attribution is often called binary program stylometry. In this thesis, we first perform a white-box impact analysis of various factors in code compilation on programming styles. For the MS Windows platform, we study the impact of multiple compilers, including gcc, Clang, and MSVC, their optimization levels, symbol stripping, and the Ghidra decompiler on programming styles. These factors are then ranked and provide guidance for binary code stylometry. Next, we perform binary stylometry on a set of six real-world crypto ransomware and aim for highly automated classification through leveraging the powerful open-source Ghidra decompiler. The validation accuracy reaches 42.6%. This study can be used as the first step to quickly classify binary malware before labor-intensive analysis is needed.