Threat actors have begun migrating from older and well-known compiled languages such as C/C++ to more modern languages. These newer compiled languages include Golang, Rust and Nim. These languages give developers the ability to compile programs for both Linux and Windows with little to no modification of the code. This shift has required defenders to adapt and develop analysis techniques to detect malicious programs written in these languages.
Rusts adoption for malware is in its infancy compared to Golang or Nim; however, that is quickly changing. Recently, there have been a few notable examples of malware written in Golang and Rust being used in the wild. Such examples include the BlackCat ransomware, and a publicly shared information stealer dubbed “Luca Stealer”.
Luca Stealer was released publicly under an open-source code model. The public release of the source code for the Luca Stealer allows security researchers who are unfamiliar with the Rust programming language, to gain insights on how the Rust language is leveraged to design the malware.
Hello World Rust Program Analysis & Finding the Main Function
The Rust compiler toolchain consists of rustc for compiling Rust code into executables and cargo which is the main build system for Rust projects. Cargo can be used to generate a new project using the command “cargo new [project name]”. This will create a new directory using the project name with a “Hello, World!” template program along with setting up version control in the directory using git.
The directory structure contains the following:
- src/*: Source code for the project.
- .gitignore: File containing a list of directories and files that Git should ignore.
- Cargo.toml: Rust Cargo Manifest file containing metadata about the project such as the author, Rust edition, project version and external dependency definitions.
The “main.rs” file includes a main function that will print out the string “Hello, world!”.
The project can be built by running the command “cargo build” in the project directory.
This will build a development version of the project. Rust has four different built-in build profiles: dev, release, test and bench. Each profile contains different pre-defined build settings such as optimization levels, panic strategies, debug information settings and more. The defaults for the built-in profiles can be changed in the Cargo.toml file along with specifying custom build profiles. More information about this can be found in the Cargo Profiles documentation.
The resulting binary will be placed in the “target/[release, debug]/” directory depending on what profile was specified during the build. Since it is more common for malware and other compiled programs to be built in release mode, the project can be built using the command “cargo build –release”. The resulting binary will be placed in the “target/release/” directory. Cargo will output many different artifacts during the build process here; however, the compiled binary will be the project name with the “.exe” extension. When building binaries for Windows targets, Cargo will also create a “.pdb” symbol file with symbol information for the executable by default regardless of what build profile was specified.
This example is using rustc and cargo version 1.62.1 along with Rust 2021 edition.
For a small program such as this hello world example, the defined strings in the binary mostly contain information for the program’s panic handler.
It should also be worth noting that Rust strings are not null terminated. This can cause issues with reverse engineering tools during analysis expecting strings to end with a null byte. Since there is no null byte to denote the end of a string, strings can contain overlap depending on where the compiler lays them out in the binary.
For example, the string “library\std\src\sys\windows\stdio.rs” overlaps with the string “Unexpected number of bytes for incomplete UTF-8 codepoint” here in the binary.
This can be fixed by undefining the string at that location with “Clear Code Bytes” in Ghidra.
Once that section of data is undefined, a character array can be created using the length of the first string. In this example, the first string is “library\std\src\sys\windows\stdio.rs” which has a length of 36 characters.
Finally, the second string can be defined as a TerminatedCString data type.
This results in two separate defined strings making analysis and identifying cross-references easier.
The binary also includes a “Debug Data” section visible in Ghidra’s “Program Trees” window.
This section includes a “DotNetPDBInfo” structure containing the path to the PDB file when the binary was compiled.
Ghidra will sometimes truncate the string containing the PDB name. This can be fixed by right clicking on the “.NET PDB Info” structure header and selecting “Clear Code Bytes”.
After the structure is cleared, the PDB name can be defined as a terminated C-string by right clicking the start of the PDB name and selecting “Data -> TerminatedCString”.
This could potentially leak the username of the user who compiled the program depending on where on disk the program was compiled.
The entry point for the program will set up the program’s stack cookie with “__security_init_cookie” and do some other initialization of the execution environment. The entry point will then get a pointer to the process’ argument array, argument counter and environment variables. These values are then passed to a function for initializing the main Rust environment.
The program will then set up a call to “std::rt::lang_start_internal” with the address of the main function of the written Rust program and the program argument array, environment variables array and program argument counter. Here, the address of the “main” function of the user-defined code is being placed in the RAX register.
Some tools will fail to analyze this section of code properly.
If the bytes were not properly analyzed in Ghidra, they can be disassembled through the “Disassemble” button in the right click menu.
A function can be defined by clicking on the “Create Function” button in the right click context menu.
Variations of this technique to find the main function are going to be dependent on the Rust compiler version; however, the entry point of the binary will eventually call “std::rt::lang_start_internal” passing the address of the main function, the program argument counter, the program argument array and environment variables array as arguments.
Revisiting Embedded Strings
The Rust compiler embeds many diverse types of strings in the binary. Some of the more notable ones are due to Rust’s dependency management system. The Rust build system “Cargo” has a dependency system similar to Node.js Package Manager (NPM) and Golang. External dependencies in Rust are called “crates”. Including crates into a project can be done by adding them into the “dependencies” section of the Cargo.toml file in a Rust project.
If a project wants to import the base64 crate to handle base64 encoding and decoding, it can be done as follows:
When the project is built, cargo will automatically fetch the crate from the crates.io registry and store it in the “.cargo/registry” directory of the user’s home directory by default. During the compilation step, cargo will build the object files for the crate and store them in the “target/*” directory then link the crate’s object files with the project’s compiled object files. This results in a binary which is statically compiled with the specified external crates. During compilation, the compiler will include all the panic information for each crate. This includes the path to the crate on disk of the system which compiled the binary.
If there is no post-compilation string obfuscation done, this is a good indicator for the potential capabilities of a program. Searching for the “github-” and the “cargo\registry” strings can be used to find the dependencies a program is using.
I created a Ghidra script to find crates imported by a program based on these strings and list them. https://github.com/BinaryDefense/GhidraRustDependenciesExtractor.
Unknown Rust Malware Sample Analysis and Finding the Main Function
This is an unknown malware sample written in Rust. Using the methodologies above, a basic overview of the malware’s capabilities and the main function can be found.
Looking at the “Debug Data” section of the binary, the PDB name path reveals the following path to the built PDB file:
Running the “RustDependencyStrings.py” script against the binary reveals the following crates being used:
Most of the dependencies listed are sub-dependencies and were not explicitly specified by the malware developer. Identifying which ones are a sub-dependency of a Rust crate can be done by looking at the crate’s documentation. The docs.rs website is a centralized location for developers to host the Crate documentation publicly. Searching for the “reqwest” Crate at docs.rs, shows that it is a library for performing HTTP requests.
The entry point of this program follows the same pattern as the “Hello, world!” example compiled above.
The entry point will get pointers to the program arguments array, the environment variables array, and the argument counter then pass those arguments to FUN_00404c60 after some initialization.
This is where the registers and stack are set up to call “std::rt::lang_start_internal” like the “Hello, world!” example.
The code is not completely identical to the example above due to differing compiler versions and this sample being a 32-bit binary; however, LAB_00404410 is being placed on the stack with the argument counter, program arguments array and the environment variables array from the callee function. It can be inferred that the LAB_00404410 label is the main function for the program since it is being passed as an argument to “std::rt::lang_start_internal” along with the process argument counter, process argument array and environment variables array.
This technique of finding the main function of the binary can be translated to many different Rust binaries which use the standard Rust runtime. Unfortunately, due to the unstable Application Binary Interface (ABI) and the Rust compiler constantly changing, this pattern has the potential of changing in later releases.
Rust features which make analysis difficult
The way Rust handles errors, and its native implementation of memory safe operations makes reverse engineering significantly more difficult. Essentially, the compiler will insert a lot of code that does the error checking for the developer. Examples of this are bounds checking array accesses and statements which could potentially read or write outside the appropriate allocations of memory and cause the program to panic. Rust is designed to reduce undefined behavior by inserting these checks into the program for the developer both at compile time and at runtime. A language such as C or C++ will not implement these checks for the developer leaving them with the responsibility of memory management and safety.
Rust does not have a stable Application Binary Interface (ABI) for calling conventions and the compiler is constantly changing with each release. The generated binary file from a Rust project can vary drastically depending on what compiler version is being used and what build options are set. Compilation options such as link-time optimization and codegen-units will drastically change the binary even if the source code does not change. This makes it increasingly difficult for reverse engineers and detection systems to identify the behavior of Rust malware, even if they have analyzed a previous version of the same malware family, due to a high degree of variation in code patterns.
Some of the lesser-known features of Rust are its malleability and low-level interfaces. Rust supports directly managing memory through pointers, inline-assembly and even compiling code without the standard library. Compiling code without the standard library is not limited to embedded devices. A Rust developer can create a program without depending on the Rust standard library and only import system libraries as needed. The resulting program will be significantly smaller in size but still include all the dynamic error checking. These low-level features make the language great for systems development; however, this also gives malware developers a lot of flexibility on different memory obfuscation techniques they can employ to make analysis increasingly difficult.
There has been a recent rise in malware utilizing newer programming languages and Rust is seeing an increase in popularity among malware developers. Although adopting a new language creates more work initially for malware developers, some new languages give them the advantage of lower anti-virus detection rates and making reverse-engineering more difficult for analysts. Rust is still an uncommon language for malware; however, it is starting to become increasingly prevalent in ransomware and information stealing malware. The detection landscape for this malware lacks maturity when compared to detections for malware developed in more traditional languages such as C/C++ or C#. This allows malware developers to evade detections with relative ease compared to malware written in more common languages.
Rust’s malleability and volatility provides a wide avenue of evasion techniques for malware developers without the need to rewrite the malware in its entirety. Rust is still an evolving language and much of the language could potentially change in the future. There are some commonalities and artifacts which can be used as indicators for the behavior of unknown Rust programs and techniques to help identify what is developer written code. Further research into Rust reverse engineering can be done through a more dynamic means such as debugging, emulation and behavioral analysis sandboxes. This can provide a quicker insight into capabilities of an unknown malware sample when compared to static analysis. As malware analysts gain experience with reverse-engineering Rust malware and share open-source tools with the security research community, everyone involved in securing and defending digital devices can benefit.
By: Matt Ehrnschwender (@M_alphaaa)