Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Date of Graduation
Master of Science (MS)
Department of Computer Science
Computer programmers often leave their individual programming styles in source code. Recent studies show that contrary to a popular belief, many of such programming styles can survive, in controlled environments, code compilation into binary. From the binary programming styles can be effectively retrieved for enhanced binary authorship attribution; such binary authorship attribution is often called binary program stylometry. In this thesis, we first perform a white-box impact analysis of various factors in code compilation on programming styles. For the MS Windows platform, we study the impact of multiple compilers, including gcc, Clang, and MSVC, their optimization levels, symbol stripping, and the Ghidra decompiler on programming styles. These factors are then ranked and provide guidance for binary code stylometry. Next, we perform binary stylometry on a set of six real-world crypto ransomware and aim for highly automated classification through leveraging the powerful open-source Ghidra decompiler. The validation accuracy reaches 42.6%. This study can be used as the first step to quickly classify binary malware before labor-intensive analysis is needed.
Mitchell, Alexander, "Two studies on binary program stylometry: A white-box analysis and a deep learning analysis on some crypto ransomware" (2022). Masters Theses, 2020-current. 182.
Available for download on Sunday, December 08, 2024