Written By: Stephan Simon
The official GitHub repository for YARA describes it as “a tool aimed at (but not limited to) helping malware researchers to identify and classify malware samples.” YARA allows anyone to create “descriptions” known as rules based on text or binary patterns. Although it can be used to detect patterns for any type of purpose, it is most often used for the detection of malware.
A Simple “Hello, World!” Rule
Before diving into the features of YARA, it may be helpful to see just how simple the rules can be. The following rule is a play on the simple “Hello, World!” applications many developers create when learning a new programming language.
As you may have been able to guess, this rule is looking for the text “hello, world!” to appear in a scan, whether it’s a file on disk or something in memory. The text we want to search is defined as $hello in the “strings” section and in the “condition” section we tell YARA that we would like to be alerted if the text is found. As you can see, our “hello_world” rule has only matched against the file with the text we are interested in.
Strings and Modifiers
In the “hello, world!” example, we searched for ASCII-encoded text. YARA supports searching for multiple string types but will default to ASCII when the encoding is not specified. To specify how YARA should search for strings, we can add a modifier at the end of the string definition. For example, keywords like “ascii” or “wide” could be added to the end of a string definition. Many of these modifiers can even be used together to keep rules simple and readable.
In addition to text encodings like ASCII or wide-character strings, YARA can also encode text for you before searching. Other modifiers like “base64” and “XOR“allow the author to write a plain-text string like “hello” and then base64 or XOR encode before searching. Again, this makes rules more readable, as it is obvious what the author wants to find without having to first decode what gibberish such as “aGVsbG8=” or “*’..-” means in the rule (“hello” base64 encoded and XOR encoded with a key of 0x42, respectively).
To learn more about what can be done with text strings in YARA rules, see the text strings section of the official documentation.
For YARA to be useful, the rule author must tell it exactly what they want to find. This is done in the “condition” section at the end of the rule. Conditions can be as simple as our example above, or they can be more complicated expressions, such as telling YARA to only match if the file is a Windows executable file under 500KB that contains 3 out of 5 defined strings. Let’s edit the “hello_world” rule:
It now has three defined strings, rather than just one. By reading the condition, you can tell that the author wants to find files that contain the words “hello” or “world,” but not “foo.” Only if all three conditions are met will YARA match against a file. To test this rule, a new file (ignore.txt) was created.
Even though ignore.txt had some of the words we were looking for, it also contained the word “foo” which the rule condition stated should not be present. By carefully considering the content of files when creating conditions, a rule’s author can start to expect a higher level of confidence that a matched search result is the content they are looking for.
All YARA rules need are a name and a condition. The following example will always return a match.
To keep things simple, the “hello_world” rule showed two sections: strings and condition. YARA rules can have three sections, though. The “meta” section is found at the top of a rule and can contain anything the author wants to convey about the rule. Meta tags like “author” and “description” are often found to give the author credit and let others know what the rule is meant to accomplish. When creating rules based on malware, authors will sometimes leave tags with file hashes or references to blogs with more information as well. For the purpose of this blog, we’ll just make one final update to “hello_world” by adding an author and description.
YARA rules don’t have to get complicated to be used in real situations. For example, the below screenshot is a YARA rule to detect the FuxSocy ransomware. You can also download it here.
Please note that the file extension in this sample has been renamed from .yar to .txt in order to be uploaded for this blog post.
This rule is a bit longer than the “hello_world” example and uses a few features we haven’t talked about, but nothing all that complicated. Working from top to bottom, you may notice the rule name also has ” : ransomware” appended to it. YARA allows for adding tags to rules which can help organize them or group findings together. In the meta section, I have used a variety of tags including a reference to cite where I learned about the malware and added a TLP marker so anyone who reads this rule immediately knows who it can be shared with.
The rule condition starts with something that may look rather odd, but it is fairly simple as well. “uint16(0)” is looking for two bytes at offset 0 in the scan target and then compares them to “0x5a4d” to see if the first two bytes of the file match. These two bytes are considered a file marker or “magic bytes” used to determine if the file is a Windows executable (PE) file. Next, the rule limits matches to files less than or equal to 100KB.
Finally, you may have also noticed some rather generic looking variables for each of the string names, and this was done on purpose to make writing the condition a little easier. By grouping relevant strings under similar naming conventions, a wildcard asterisk can be used in the condition to shorten the amount of strings the author needs to list explicitly and allows for easier grouping of importance. The last line for the condition in the FuxSocy rule is telling YARA that if any one of the strings labeled “$n” and some number are present, this section of the condition should return true. If those strings are not found, look for any four “$s” strings. Sometimes it isn’t possible to find completely unique strings in a sample, but the combination an analyst finds could be.
Can I use YARA if a sample has no obvious text strings?
Absolutely! This post was meant to be a simple introduction of basic features in YARA that anyone could take advantage of, no reverse engineering skills required. In a future post, we will cover how to create rules based on the code itself.