AI in Cyber Security detection and response

When asking Generative AI on its role in the SOC, the words revolutionary and transformative appear early in the description, this article will describe actual use cases, the revolution is unfortunately has not yet arrived.

The topics the industry describes in the AI role in the SOC include the following:

Threat Detection
Threat Hunting
Incident Response
Threat intelligence
Alert management
False positive reduction
Forensic analysis
Automation

This article will review use cases of AI, based on actual SOC usage of the latest and greatest technologies by the leading vendors in an environment serving over 100,000 users in over 60 organizations.

1. Threat detection

In 2014 RSA conference in San Fransisco, walking through the booths searching for innovative detection solutions and cool t-shirts I found one that read “I broke the rules”, both company and t-shirt promised that SIEM rules, prevalent from the early 2000’s were evolving into machine based analytics (sometimes referred to as ML or AI), changing the methods we create and managed detection rules.

Threat detection is a wide topic, we will limit the review scope to (near) real time alert and incident creation.

The best place to look for AI success in alert creation is the large SIEM / XDR vendors, who lead the way in AI use advancement.

Approach

As we are working with multiple vendors who use AI and ML, the SOC has years of experience investigating ML based incidents.

We’ve chosen a 5,000-user organization which has multiple vendors that create AI based alerts and reviewed 1 year of incidents in order to learn more on the quality of AI created alerts and incidents.

Statistics:

~100 GB logs collected per day to a cloud SIEM.
- Collection - 20 unique vendors.
- Collection - 75 unique products.
~13,000 incidents created.
- ~7,800 tickets opened (~5,200 incidents automatically closed or deduplicated).
524 tickets escalated, from which:
- 106 AI based tickets escalated,
- 418 Rule based incidents escalated.

AI / ML created the following alert types:

Unfamiliar client properties.
Atypical travel.
Rare operations.
New behavior.
Uncommon process action.
Large upload.

Reviewing the final resolutions of all tickets escalated, no true positives were based on AI detection.

Hypothesis

Tickets closed and not escalated by the SOC, were false positives or alternatively missed by human mistake equally across all types of incidents.

Conclusion

AI / ML can detect deviations from baselines that otherwise cannot be detected, these are highly effective to improve understanding the environment and enriching incidents, however, anomalies do not imply maliciousness, these, have to be additionally correlated and investigated to reduce false positives. The analysis of AI / ML incidents has shown that these technologies still require significant improvement, particularly in the quality of detection and the ability to identify real threats in billions of noisy logs. While AI /ML can assist SOC teams in threat detection, they are not yet capable of eliminating rule creation.

2. Threat Hunting

We define “hunting” or “threat hunting” as the asynchronous detection of attacks or compromise using different techniques that include:

Event anomaly
Indicators of compromise or attack indicators
Pivoting on a known bad (alert, incident, intelligence)
Hypothesis investigation

Successful hunting requires detecting anomalies and connecting dots, in these, AI / ML excels, although we can define adding different artifacts or minor alerts to create an incident as “detection”, and some vendors do exactly that, we are referring to the more manual or asynchronous process of detection, although in essence, in anomaly hunting or pivoting, analysts are stitching and correlating manually, which, in our view, can be done better using ML, after proper training.

AI and ML assist in every mentioned type of threat hunting as even when pivoting or hitting on an indicator, understanding prevalence, rarity or anomalous behavior can be key for a successful hunt. For example: an IP address related to a specific malware was accessed by a number of assets, understanding if these assets are different from other assets or that the process making the connection is rare, is a task ML is efficient at. Combining indicators which create false positives, with behavioral prevalence and rarity is uniquely achievable using ML.

3. Incident Response

Many clients regard SOC Incident Response as specific actions like blocking and isolation, our definition of IR: "The actions taken to reduce time and scope of an incident and mitigate adversaries’ malicious actions".

Without confidence, i.e. being certain of a malicious actor or compromised asset, actions like blocking or isolation would not be performed, machine learning correlating alerts can create certainty to a level that an action would be taken.

Scoping the breadth of a compromise can also be assisted by AI, by providing answers like – which assets have vulnerabilities, software or characteristics similar to the compromised asset.

After log4j vulnerability discovery, corporates were busy locating the vulnerable assets and prioritizing remediation based on risks, these risks can be quantified using statistical analysis like the one AI / ML can provide.

Similarly, when a phishing email was successful, the time spent in finding similar emails based on source, subject, headers, link and content to delete from users’ mailboxes can be shortened by using AI / ML.

In one incident, we have detected an adversary which took control over an email server, we (external incident response team) were not allowed to isolate the machine for 2 hours until understanding the business implications and convincing system and business that the actions were necessary to stop the adversary.

We believe, it would take time for responders to trust AI to a allow mass isolation and devastating decisions like blocking segments and networks although we trust AI with lives when driving cars, the difference is the log based digital detection world is still untrusted thus AI will keep assisting and consulting, but critical, impactful decisions will still be left to humans. As of now, we recommend considering AI / ML as an assistant rather than a decision-maker, especially when it comes to intrusive actions. We expect trust in AI actions will increase as more and more decisions in different areas of life will be assisted by AI.

4. Threat intelligence

Threat intelligence is the process of identifying and analyzing Cyber threats to output strategic, tactical and operational actions and insights.

Strategic threat intelligence can foresee a trend which may affect a company strategy, while tactical or operational may integrate in security systems to block or detect adversaries.

According to large intelligence vendors, AI is already playing a key role in threat intelligence, some of the predominant use cases are:

Summarizing collected data to human readable insights
Guide security teams on mitigations and next steps
Ranking and scoring files and threats based on AI models
Identifying and Understanding code functions
Translating and normalizing data
Extracting entities from data collected

While some of these applications are behind the scenes for SOC teams, others are directly visible when threat intelligence platforms deliver their outputs.

5. Alert management

For this article we will define alert management as deduplication, suppression and aggregation of multiple alerts or logs into an incident, which reduce SOC alert fatigue dealing with less incidents.

Aggregation: summarizing hundreds or thousands of alerts into a single incident doesn’t necessarily require ML as the attributes for aggregating exist in the logs and the decision to include multiple alerts in an incident can be based on one or (usually) more attributes.

ML and AI assist in creating additional attributes and relationships that are not a part of the original event or alert, allowing potentially more effective aggregation.

Deduplication: although different vendors boast AI based deduplication, the reality is more complicated, simplified example:

During 10 minutes different alerts are triggering from the same IP address, if the IP is a single machine, it makes sense to deduplicate and or aggregate to a single incident, however, if the IP address is a firewall in a segmented environment, these should not be deduplicated as these may well be different, unrelated incidents.

Similarly, alerts stemming from “guest WIFI” connecting to a malicious destination, should not be deduplicated or aggregated with corporate assets from the internal network, accessing the same malicious destination as these alerts may indicate corporate compromise.

Suppressing alerts is similar and also should be taken into account as not to flood the SOC analysts, in order to suppress, the system must understand what the attributes for suppression are, deciding that a common process (like a browser) and a common destination (like a proxy) are basis for alert suppression, may result in missing out on important alerts.

6. False positive reduction

The most common way to reduce false positives is increasing incident confidence and a most common way of boosting confidence is correlation. Correlation in the SOC usually means not creating an incident (or a ticket) until several alerts with a common attribute (usually hostname or IP) trigger in a certain timeframe, adding to the incident fidelity.

What’s not commonly discussed very often is the impact of correlation. The underlying theme of SOC alert correlation is that the SOC is not aware of or addresses alerts that do not reach a certain threshold, these may be crucial and relevant to incident detection but may be missed in “low and slow” attacks.

The decision to correlate has in many cases shifted to the software vendors as modern SIEMs alerting is no longer based just on the operator decision and content created, rather, ML and vendor created content trigger incidents.

One important decision that vendors are making is how to reduce false positives, using ML, vendors can profile assets and behaviors and if “everyone” does it – it must be ok, thus excessive alerts are analyzed and attribute-based analysis can improve incident confidence. For example: if 10% of the assets are accessing a low reputation website but only 1 is using a rarely seen process and uploading data, the SIEM may automatically exclude the common processes from the incident creation. Reaching the same conclusion manually is sometimes possible but cumbersome. In other words, AI / ML can exclude legitimate processes and focus on a single suspicious process that is usually detected only after a human analyst's investigation, reducing analyst effort and false positives. Standard practice, the incident is triaged or investigated and based on the outcome and analyst insights, exclusions and whitelisting is conducted, reducing future false positives.

7. Forensic analysis

A single computer may contain millions of files, these contain images, text files, binaries, compressed and potentially encrypted files.

Forensic analysis, like threat hunting, can be well assisted using AI, image categorization, translations, pattern recognition, anomaly detection, language processing are some of the relevant capabilities.

The SOC unfortunately has not yet enjoyed the potential benefits of AI in computer forensics, these capabilities, will have to be incorporated in commercial tools and proved in courts in order to impact computer forensics which, by its nature and limitations (forensically sound) advances slower than Cyber security.

8. Automation

The SOC has been advancing automation for a decade, starting with scheduled scripts and evolving to the “Gartner coined” SOAR (security orchestration automation and response) in 2017.

SOAR as a full-fledged system failed to be adopted by SOCs as it required programing skills and cumbersome playbook creation. SOARs were either incorporated in the large vendors SIEMs (Splunk – Phantom, PaloAltoNetworks – Demisto, Google – Siemplify, Microsoft – Logic Apps) or reinvented as low code / no code systems that also leverage ML and AI.

The next generation SOAR allows the SOC to easily create playbooks using natural language, supporting analysts and lowering the bar for automation usage, for BDO MDR, automation closes 40%-60% of the incidents using deduplication and automatic triage.

Future evolutions may create playbooks based on analysts’ behaviors and operational procedures, these will definitely be reliant on AI language processing and pattern recognition.

Summary

AI and ML are changing detection and response, some changes, like vendor incident creation, are reducing SOC interactions as SIEM vendors are creating the incidents and the SOC is triaging and responding, those incidents, once based on manual rule creation only, are now being developed by the vendors and using ML capabilities. Other changes like automation, are increasing SOC interactions as the barriers to create playbooks are dissolving due to low code / no code and natural language capabilities.

As the adversaries are also using AI, questions remain regarding the SOC detection response improvement, as shifting to ML / AI detection does not necessarily means better detection, however, it does mean, detection content and visibility beyond your local SIEM SOC team capabilities as the advanced ML detection is now a part of the SIEM / XDR vendor portfolio.

Did AI revolutionize the SOC? Not yet.

Will AI revolutionize the SOC? Yes, as the vendors improve ROI of AI in detection response.