I’ve mentioned before that I like YARA. It’s been a nice quality these past few years that I’ve averaged about 6 classes each year. I’m batting a higher average for 2017, but I’m by no means complaining. It is one of my favorite subjects to teach.
While teaching a recent YARA class, a student cornered me during a break. He really honed in on how the objective for one lab really could create a false sense of success. As enhancing the class is a regular objective of mine, I queried for more information. His thinking was that the way I asked students to build their YARA rule would definitely result in detection, but it wouldn’t be a thorough and accurate detection. He provided a few other points and we talked about the lab and its objectives in class. The majority agreed with him and I demurred to answer the question until a little bit later. A few more labs afterward, we re-discussed the topic and their perspective changed a bit. Where previously most of the class agreed, most flipped to the opposite position after we worked the labs that came afterward. The key, of course, was perspective.
If the lab was taken in isolation or even as built-up to the current level, then yes, I agreed with him. We were putting the blinders on. The element missing from his argument was I knew where we were going and he did not. The objective for the lab was to perform loose matching. In this case, I asked students to draft a YARA rule to detect on URL traces in a PDF. I told them it was okay if they stopped at just detecting http/https, ftp, and mailto. The one student went very much above and beyond this trivial detection method. His detection was a masterpiece of regex and very accurate, also meeting the requirement. We went on with further labs and built on our pre-existing rules to isolate that URL traces detection to specific sections in the PDF, to proximity to other PDF actions and eventually within acroforms. Then, we tightened up the detection to make it more accurate, building a sharper detection for URLs in regex and in hex; using hex jumps and alternate paths to really make a stronger, tighter detection.
Employing the thought of loose matching to tighter matching meant we could re-use earlier loose detections in different context within the conditional logic of later rules. It allowed the introduction of loose detection as a strategy to find interesting things and then tighter and tighter detections to hone in on something specific. By the time the students had finished the 7th lab, they had, give-or-take, about 100 rules. A small portion defined file types and then descended into detections for various elements common to many files. The other two-thirds of the rules focused heavily on PDF and email files at this point (PE and file types come afterward) with a few very specific rules. At this point, the class started to understand where we were heading. Earlier, “too loose” rules were being fed into more rules afterward and rules constructed earlier in the day were being leveraged in later ones. By the end of the first day of the class, the majority of the 50 or so rules written for the later labs leveraged at least one earlier rule and some two, three or more. Roughly a dozen or so for each student eschewed a strings section completely and consisted of previous rules and additions in the condition line only. That progression and reuse is an underlying objective of day one and it was nice to watch the dawning realization on students’ faces as we moved over the tipping point into reuse and modularity.
It’s easy to put the blinders on. It doesn’t take much to forget your focus and keeping the goal in mind is critical, not only in rule development, but also when you use someone else’s rules. Especially, in this instance. It’s a bad idea to assume all rules are written with the same objective in mind. Some are written against the wild targets, some against refined ones. Some are heuristic in nature and others very specific to a single, unique target. Hopefully, that is captured in the metadata with the rule and within the rule set. Otherwise, we quickly lose sight of why the rules were drafted and how they should be used.