Don't second guess. Go with experience.

YARA Hashing Magic

information-security

Back a few years before I started in digital forensics, hashing had a whole different context to me.  Back then, if you were “hashing” you were imbibing heavily and then going for a run, something I saw pretty much every morning when I was overseas.  Not that we didn’t have a bunch of other names for it that are probably inappropriate for a blog post, but hashing was what stuck out of the metaphorical mudslinging back then.

 

Hashing, obviously has a completely different meaning today — in fact most, likely have no idea of the previously usage.  Hashing with YARA doesn’t mean we are going to swill some spirits and dash out on a 5K.  It does mean we are going to use hashes — message digests of files, processes and other fun things — to do some investigative matching.  Before trundling too far down that road, a brief touch on objectives is in order.  Since hashing in our context means to create a shortcut of sorts for any content we run it against, using that wisely is requisite.  Hashing, of course, implies we are seeking out an equality statement.  That the content I just hashed exists in the content of the target I am investigating.  You can do that in a full content basis, such as checking the hash of an entire file.  In my example below, in the condition line I’m interested in notepad and whether the target I passed matches its hash.

 

//note:  IsPE is a previously defined rule that determines whether a file is PE file or not.

 

import “hash”

 

rule notepad_by_hash:hash,genuinetools {

 

meta:

description = “notepad hash matching”

 

condition:

IsPE and

filesize < 350KB and

hash.md5(0, filesize) == “e30299799c4ece3b53f4a7b8897a35b6”

}

 

 

Now use the following command to search the current path for hash:

 

//consider the rule to be saved in an equivalently named yara file.  The “.” is just convenient shorthand.

 

yara notepad_by_hash.yar .

 

Personally, I like more flexible rules, though I’ll freely admit I’ve a long rule set of simple rules like the above.  More flexible, in this case, might mean leveraging external variables, where I pass the hash value via the external variable.

 

[snip]

 

condition:

IsPE and

filesize < 350KB and

hash.md5(0, filesize) == isNotepad

}

 

 

yara -d isNotepad=“e30299799c4ece3b53f4a7b8897a35b6” notepad_by_hash.yar .

Obviously, you could perform either situation repetitively by passing multiple external variables with the -d switch or by adding or logic with other hashes to check.  What if I wanted to match against something smaller than the whole file?  Easy enough.  Change the size of the swath of data you are targeting for the hash.  So, instead of hash.md5(0,filesize) you look at hash.md5(300,500) or hash.md5(filesize-200,filesize).  In fact, to paint a scenario, let’s go ahead and say that that 200 bytes at the end represents EOF data.   Additional data, that while consistent in size (200) isn’t consistent in content, except for a single 20-byte chuck that we’ve been able to hash consistently.  It’s a perfect scenario to use a container and hashing.  Let’s focus on the condition line, since that’s where the magic is happening.

 

condition:

IsPE and //since its a PE file

filesize < 420KB and //and we are looking for a specific file size

for any i in (20,40,60,80,100,120,140,160,180, 200) : hash.md5(filesize-i, filesize+20-i) ==  “302f73788a2dcfac52f4a9b3397c35f6”

}

 

The first two elements of the condition line are just good form to make sure we are looking at the right files.  IsPE and the filesize restriction takes advantage of short circuit logic for efficiency and makes sure we only look at files that meet that criteria.  The last part, however, is our iterative logic.  It looks at 20-byte chunks in the last 200 bytes of the file and checks for a hash match.  It will only work, however, if the content we expect lines up on an even subsection of 20.  Since that’s unlikely and this rule is going to be slow anyway (be careful on its’ application), let’s adjust it to be more precise:

 

condition:

IsPE and //since its a PE file

filesize < 420KB and //and we are looking for a specific file size

for any i in (20..200) : hash.md5(filesize-i, filesize+20-i) ==  “302f73788a2dcfac52f4a9b3397c35f6”

}

 

Now, we are looking at each possibility one byte different at a time.  It’s not pretty and plenty of other ways exist to do this (like a hex string for the content instead), but it does demonstrate a use case for the hash module.  Beyond MD5, you can use SHA1 and SHA256 values to do the same thing, just replace the hash.md5 function call with the appropriate SHA version.

About the author

Monty St John

Monty is a security professional with more than two decades of experience in threat intelligence, digital forensics, malware analytics, quality services, software engineering, development, IT/informatics, project management and training. He is an ISO 17025 laboratory auditor and assessor, reviewing and auditing 40+ laboratories. Monty is also a game designer and publisher who has authored more than 24 products and 35 editorial works.

Contact CyberDefenses today to learn how we can help your company’s cybersecurity needs.