Much of the intelligence and SIEM tools cybersecurity and IT pros use day in and day out have a machine learning component that falls into the category of Artificial Intelligence (AI). For example, it could be argued that the AlienVault Unified Security Management (USM) product, with SIEM capabilities, operates with some AI characteristics and capabilities. That may sound surprising, but it’s very easy to track, at a high level, the growing level of sophistication it gathers from machine learning components across its history.
Generalizing slightly, machine learning in products like the AlienVault USM works (well, actually learns is the right word) by finding relationships within the data it sees and applying that knowledge to real-life situations – in our case, all that security data we are throwing at the SIEM. It’s doing its magic in the background with some previous training and information further supplemented by you. Yes, by your hand.
Training comes from you in the form of the changes you’ve made to rules, data sources, other variations to the norm and changes you’ve added to make your version of your SIEM function in your environment.
When Machine Learning and AI Go Awry, We’re to Blame
In a perfect world, this approach lets the system accurately interpret new information, but poorly thought-out rules can manipulate the process to sway its decisions in the wrong direction.
How many suppression rules did you begin with and how many were in place after a few months?
Some of these rules are going to be important. They are built to help discern what is truly a problem or when an event that doesn’t present a security threat occurs. Each rule you apply trains the AI functionality. It’s an incredible distinction to keep in mind. In the AlienVault example, each rule is telling the tool that something is or isn’t important. That it doesn’t need to care or, that it really should care.
Suppression Rules Can Diminish AI’s Effectiveness
The savvy layman knows that to truly shut down problems you must cut them off at the source. Have you counted the number of filtering rules you might have in play? Did it grow well beyond the initial set? These rules kill events at the source, or in our case, the sensor.
They are different from suppression rules, which can take it away from your view but not actually shut off the event. Like filtering rules, these should be used sparingly and with a sunset moratorium in mind, e.g., choose to suppress rules for a time or a number of occurrences observed.
Suppression rules are highly useful in situations where an event is causing an overload of alarms that you need to keep from view for a short period of time. The key is not making this type of view permanent, which unfortunately is more often the cause. Don’t feed your AI the wrong food. Give it quality input. Make sure you understand your long-term SIEM requirements before entering in the wrong rules.
SIEMs provide a wide array of orchestration rules that train your AI to make good (and bad) decisions. Two key elements are in play here:
- The first element is that you make good rules, be they filtering, suppression, notification, alarm or others. Know when to actually drop events and issues so that they are not even collected versus when to choose not to see them for a period.
- The second element is that you review and re-evaluate anything you have put in play. Pick a period, be it quarterly or longer, to double check that the logic you are running is still valid.
Unsound Logic Can Poison Results
Once these two elements are resolved, and you have the scope and concept of what should be handled and how, when, etc., what follows is a very crucial consideration. That’s making sure the logic is correct. Logical errors are the hardest problems to resolve and the toughest to identify and remove after the logic is in action.
Curious about what I mean?
Orchestration rules are built on operators. Some are very straightforward, such as an equals “=“ operator in which a specific field equals a specific value. Others are more broad, such as the Contains operator, e.g., the presence of a specific fragment of a string within a longer string.
A few of these operators are extremely poisonous if used incorrectly.
“Not Equals” is one that gets people in trouble frequently. No faster way exists to lobotomize your AI than mixing up the concepts of “is empty”, “is not empty”, “not equals” and “not equals, case insensitive”. The first item to focus on is conveying this logic properly. Understand what you want to find first, before rendering the logic.
If you are looking to find a field that has no value, which has an empty string, then use “is empty”. Adding the “Not” word to the logic provides the opposite. Here is where the “not equals” operator is very important to understand before employing it. Search values that return will be those that do not match the value provided. It does not, necessarily, mean you will be provided empty values as well or it will provide a return that doesn’t exactly match. Use the “case insensitive” version to provide matches regardless of case considerations.
“In” also requires some care. If the values evaluated are not in the list you provide, then a match won’t be returned. Make sure that is the logic you expect before implementation. Use the “case insensitive” version where that is important to your logic.
Don’t Let Regex Become a Stumbling Block
The greatest failure is when the “match” or “match, case insensitive” version is employed. Here is where our savviness at regular expression, better known as regex, can get us in trouble. First, remember that it uses the python flavor of regex, the 2.7 version when I last checked. That means pay attention to changes that might exist if you are used to PCRE/PHP flavors of regex.
Keep the basic rules of writing regex in mind when building these types of matching. First rule of thumb is the longer the regex, the higher the chance of it failing. Small, carefully crafted regular expressions are more agile and likely to succeed than highly complicated ones. Don’t try to slam all your logic into one giant regex. Break it down and remember the workflow of orchestration rules. Remember, train AI, don’t feed it poison.
Everything starts with data that is in turn normalized after collection. Then, data is filtered, followed by correlation steps that suppress, respond, or act before alarm logic is brought into play. Try not to use regex at the filtering level. You run a high risk of losing critical data when/if the regex goes awry. Always focus this type of logic in the correlation stages.
The second rule of thumb is if the regex is really short, then it’s going to cause problems. “\\d+” is awesome! It will match and match and match any number as many times as possible…and match on everything. It’s too inaccurate. It requires boundaries and further logic to make useful.
The third rule of thumb is never, never, never, ever forget that you can concatenate logic and that rules can have a group set of conditions. Keep the regex short and break down the major components of logic. Bind them together by using “and”, “or”, “and not” and lastly, “or not” conditions.
Think through the logic.
Write it down.
The act of articulating, “I need it to match this field and that field but not this other field only when it fits in this category” helps you define where to put the conditions.
“I need it to match this field” — condition 1
“this other field”
“when it fits in this category”
That can be a group of regexes or the right mix of regex and operators defined against fields to make it accurate.
The key takeaway is always factor in the machine learning and AI capabilities of your tools when you’re working with you SIEM, threat hunting, and other cybersecurity and intelligence tools. Remember that what you put in will dictate the outcome for your work as well as for others on your team, now and in the future. Only when you provide for AI and give it the right focus and attention will you be able to see the full scope of the good things it can do to help keep your organization secure.