Slicing Logic across Rules with YARA

yaralogo

Continuing our series on YARA, in part 3 (see here for Part 2 and Part 1) let’s spend some time diversifying logic across multiple rules.

It is fast and easy to put together a monolithic rule but that approach suffers when it comes time to extend, expand or combine the rule. Slicing logic across multiple rules provide the same benefit of coding in modules (need modular programming link) for much the same reasons. YARA really only does a few things, but it does them so well and versatile that it can be applied in manifold to a larger number of situations. One of those tasks – a chief one at that – is to classify files. To classify something, you must know its characteristics (elements, attributes, components, etc.) and how they fit within the class. It is how you determine it falls within the class or not. Classes can also be wide (Ransomware) or narrow (Crysis). They also have a family tree of mothers, fathers, cousins, siblings and other extended family. For example ransomware is a close cousin to destructive malware and both are descendants of malware. How you define the classes and what you name them is up to you. Here’s how Kaspersky handles their classification:

I’ve seen three approaches taken to classify files that seemed to work.  One is to iteratively match class characteristics to an unknown file. The other is to identify the characteristics of an unknown file and look for a match.  The third is a composite of the two.

Approach One

For the first approach, you must have boiled down class characteristics into rules. This represents a more classic use of YARA, where you will run logically grouped rules against an unknown file. The rules are very specific and usually declarative in nature. For example, a class might be defined as:
Generic Class Characteristics

  • Portable Executable
  • 6-9 code sections
  • 1 sections has a zero-byte hash value
  • Defined communication string
  • Overlay of obfuscated data at EOF
  • File size between 1MB and 2MB
Approach Two

The second approach, leverages fuzzy logic within YARA to heuristically match. Here, the goal is not exactness, but accuracy in discovering characteristics. Here, you detect for the characteristics of the file with a flexible set of YARA rules instead of matching on exact rules. Think of it as more of the Socratic method as you use YARA to iteratively ask questions. Some questions you might ask are:

  • What type of file is it?
  • Are countermeasures (obfuscation, packing, anti-debug, vm, etc.) present?
  • What are its components?
  • What components are missing (that should be there)?
  • What is the file’s composition?
  • Are interesting strings present?
  • Does it contain communication capability?
  • How does it perform persistence?
  • Does it do any file manipulation?
Approach Three

The third identifies a characteristic of the file and then pulls classes that contain that characteristic to look for matches. If no match is found, a different characteristic is identified and the process repeated until a match or no match is found.

These approaches can be summarized as such:

About the author

Monty St John

Monty is a security professional with more than two decades of experience in threat intelligence, digital forensics, malware analytics, quality services, software engineering, development, IT/informatics, project management and training. He is an ISO 17025 laboratory auditor and assessor, reviewing and auditing 40+ laboratories. Monty is also a game designer and publisher who has authored more than 24 products and 35 editorial works.