Pattern Recognition, Analysis and Profiling for Investigations and Threat Hunting

Pattern Recognition

We’ve received a few questions about what we cover in our Pattern Recognition, Analysis and Profiling class. If you are a cyber investigator or analyst – or any role that requires you to extract meaningful information for threat hunting or investigative activities – understanding how to efficiently and effectively identify patterns in huge volumes of data and complex systems is invaluable. This class was developed to help you gain knowledge that you can directly apply to this important work. While the syllabus outlines what’s in the class at a high level, there is only so much detail we can provide in that format. So here’s the breakdown of what you can expect at a deeper level.

The class is split into three days and begins with recognition and discovery of simplex patterns before gearing up to more and more complex patterns each day. We’ve designed the class around information that can be immediately and practically applied in real-world situations. For instance, instead of taking an academic approach, we focus on pattern discovery and creation around common tasks that land on investigators’s or analysts’s desks.

We’ve made the class even more practical by employing tools already at your disposal, instead of introducting new tools. For example, regex is the key component of finding patterns in a variety and it is a tool you already have. Using regex to find a text pattern match in a single file is a given. Expanding that scope across many files—an entire operating system or an enterprise—might not be as obvious. Additionally, you can use it to find sequences of matches, or matches between sets of unordered data in a file or across many files.  Our task is to show how to use it in orthodox and unorthodox ways. It has many uses, both known and potentially surprising.

Day One steps back from the big stuff and ensures a good foundation for patterns – their recognition and creation – before taking the cliff dive into bigger things.  That’s done via some Power Point (no way around that), but rapidly gets into practical applications.

Patterns are found and created using household tools like grep, sed, and awk.  Day One sets the stage for the later days and more complex activity.  The practice is done across those tasks that can be irksome, but good to perform depending on your investigation:

  • Parsing chat logs
  • Sorting through emails
  • Frequency analysis on web browsing or searching
  • Processing credential dumps
  • Parsing terminal/shell history
  • …Plenty more

Nothing demonstrates better than an example. Using the last item above, which is parsing terminal history, let’s parse it for commands related to network enumeration—something that might occur during an assessment, incident response or investigation of activity.  The table that follows contains commands that might be executed for enumeration:


These are part of a pattern of activity that I’m interested in reviewing.  The simple pattern will be discovering their presence in the history; the larger pattern identifies order, context, and results of these.

The start of that can be performed as the example below:

history | awk ‘{CMD[$2]++;count++;}END { for (a in CMD)print CMD[a] ” ” CMD[a]/count*100 “% ” a;}’ | grep -v “./” | grep -f enum_list.txt |  column -c3 -s ” ” -t | sort -nr | nl |  head -n10

The breakdown of this command follows:

  • The “history” command provides a list of commands run in the terminal.
  • The “awk” command will help us format the output.
    • The first argument item we ask awk to do is count the commands.
    • The second is to print them
    • The third is to turn the count into a percentage.
  • The “grep” command is the all-purpose scanner.
    • First we ask it to return all commands that don’t match “./”, which is shorthand notation we don’t want.
    • Next we pass our list of keywords that we outlined in the table previously, that are in a file called “enum_list.txt”
  • The “column” command does what you think it would, placing the data into columns separated by a space.
  • The “sort” command sorts the data numerically in reverse order
  • The “nl” command numbers the lines to be printed
  • The “head” command constrains the output to only 10 lines.

As you could imagine, a next step might be to understand those commands in specific order to find out if a sequence pattern exists.

I’ll leave the rest for the class and look forward to seeing you there.

Sign up for the upcoming course or learn more by visiting our academy page.

About the author

Monty St John

Monty is a security professional with more than two decades of experience in threat intelligence, digital forensics, malware analytics, quality services, software engineering, development, IT/informatics, project management and training. He is an ISO 17025 laboratory auditor and assessor, reviewing and auditing 40+ laboratories. Monty is also a game designer and publisher who has authored more than 24 products and 35 editorial works.