Building Blocks of Success with YARA


I like YARA.
In fact, I think it is one of the more flexible and powerful tools in an incident responder’s toolkit. Same for threat intelligence, analysts and folks in digital forensics. A chief aspect of this fabulous program is its open source and integration into dozens of tools that are likely already in your toolset. Not to mention, YARA’s ability to implement critical thinking methodology and support for fuzzy logic makes it pretty much as powerful as Powershell, Python, Ruby, or any other scripting language you would care to name. Perhaps nowhere near as broad in aspect, but definitely King when it comes to implementing pattern matching across any file type, event those armored up to the hilt with protective measures. Packed? No problem. YARA can identify the packer and with the right logic, point you to all the aspects of that packing discoverable via static analysis, to tackle those unknown or custom packers. Obfuscated? Different question, same problem, same answer. FUD or cyptor? Again, YARA plays its part to lead the way. In all this goodness, you must be asking what’s the fly in the ointment or flaw unseen in the masterpiece? Scope out the mirror. It’s you. YARA is only as good as the logic you empower it with to perform.

YARA is a lawful being. It follows the rules. Feed it poorly contrived, limited rules and it will perform to that level of expectation. Provide smart, well-thought out guidelines and it explodes with capability.

Let me disclose for clarity. I teach people YARA. Not just the, “hey, buddy let me show you how this works” kind of teaching that you do for OJT, but also as an instructor in front of a lineup of students with varying skill levels. I see two things over and over. People, even the talented experienced ones, are short-sighted when crafting their logic. They think about the issue in front of them and the rules constructed are not reusable; they don’t scale, extend or otherwise remain useful outside of the context of that immediate event. A bit too much “one and done” that solves the immediate issue, but fails in the long term as the work must be re-completed for the next incident that happens. The other is tunnel vision. The logic of their rules and use of YARA is aligned too narrowly. They craft something that works at the command line but fails miserably when attempted across the enterprise or when imported into security tool. It works fine when ran in isolation, even against thousands of targets, let’s say, but flies off the cliff when applied more broadly. The culprit is usually (a) scope or (b) inefficiency. Scope is when the rule tries to detect too much with its logic and ends up detecting too much (false positives) or nothing (false negatives). This happens when rules are crafted narrow enough to work when isolated but are not scoped to work when applied in different context. Inefficiency is not understanding the gravity of each rule and the logic it contains. Regexes are a common killer here. Yes, they are awesome, but they are slow, inefficient beasts that perform poorly compared to atom matching with strings or hex. Drawing from my own experience, at least half or more regexes can be completely replaced with similar hex combinations that leverage jumps and positioning or occurrence logic in the condition line.

YARA is very strong at declarative and connective matching. If you feed it absolutes, i.e. A, B, C and D must be present and E must not – it will match with speed and effectiveness, no matter how complex that declared logic is crafted. It’s next super power is cause and effect. Or, perhaps best defined as “if…then” though YARA doesn’t use those terms.

Consider the following condition line:

(#a ==10) AND (@b > filesize-300)

This statement says the rule will evaluate true IF the count of string defined as $a in the target file is 10 (no more, no less) AND the position of the string defined as $b is position in the last 300 bytes of the file. The IF portion is best described as the condition line statement and the THEN is always going to be True or False. Most people attain this level of understanding when they begin writing YARA rules. Where they fall-down, is applying to it rules and not just condition lines. IF rule 1 THEN rule 2 and rule 3 ELSEIF rule 2 and NOT rule 1 THEN Rule 4 and so on. The true heights of cause and effect are realized with YARA when you distribute the logic across multiple rules and employ the right constraints to achieve accuracy of matching. That discussion I’ll save for next part of this series.

About the author

Monty St John

Monty is a security professional with more than two decades of experience in threat intelligence, digital forensics, malware analytics, quality services, software engineering, development, IT/informatics, project management and training. He is an ISO 17025 laboratory auditor and assessor, reviewing and auditing 40+ laboratories. Monty is also a game designer and publisher who has authored more than 24 products and 35 editorial works.