Continuing our series on YARA, in part 3 (see here for Part 2 and Part 1) let’s spend some time diversifying logic across multiple rules.
It is fast and easy to put together a monolithic rule but that approach suffers when it comes time to extend, expand or combine the rule. Slicing logic across multiple rules provide the same benefit of coding in modules (need modular programming link) for much the same reasons. YARA really only does a few things, but it does them so well and versatile that it can be applied in manifold to a larger number of situations. One of those tasks – a chief one at that – is to classify files. To classify something, you must know its characteristics (elements, attributes, components, etc.) and how they fit within the class. It is how you determine it falls within the class or not. Classes can also be wide (Ransomware) or narrow (Crysis). They also have a family tree of mothers, fathers, cousins, siblings and other extended family. For example ransomware is a close cousin to destructive malware and both are descendants of malware. How you define the classes and what you name them is up to you. Here’s how Kaspersky handles their classification:
I’ve seen three approaches taken to classify files that seemed to work. One is to iteratively match class characteristics to an unknown file. The other is to identify the characteristics of an unknown file and look for a match. The third is a composite of the two.
For the first approach, you must have boiled down class characteristics into rules. This represents a more classic use of YARA, where you will run logically grouped rules against an unknown file. The rules are very specific and usually declarative in nature. For example, a class might be defined as:
Generic Class Characteristics
- Portable Executable
- 6-9 code sections
- 1 sections has a zero-byte hash value
- Defined communication string
- Overlay of obfuscated data at EOF
- File size between 1MB and 2MB
The second approach, leverages fuzzy logic within YARA to heuristically match. Here, the goal is not exactness, but accuracy in discovering characteristics. Here, you detect for the characteristics of the file with a flexible set of YARA rules instead of matching on exact rules. Think of it as more of the Socratic method as you use YARA to iteratively ask questions. Some questions you might ask are:
- What type of file is it?
- Are countermeasures (obfuscation, packing, anti-debug, vm, etc.) present?
- What are its components?
- What components are missing (that should be there)?
- What is the file’s composition?
- Are interesting strings present?
- Does it contain communication capability?
- How does it perform persistence?
- Does it do any file manipulation?
The third identifies a characteristic of the file and then pulls classes that contain that characteristic to look for matches. If no match is found, a different characteristic is identified and the process repeated until a match or no match is found.
These approaches can be summarized as such: