Mastering SIEM: From Planning to Implementing and Managing

Introduction to SIEM

Is it pronounced ‘sim’, ‘seem’, or ‘s-i-e-m’’? I’m pulling for ‘sim’, but we’ll be diving in much deeper than just how to pronounce this acronym. There are numerous articles, blog posts, solution provider writeups, and many other resources out there surrounding SIEM solutions, but we’re here to help demystify this topic, look at it from a technical and more managerial aspect, and discuss a few helpful notes that I wish I knew instead of struggling through them myself so we can all master the SIEM.

What is a SIEM?

We’ll start off by breaking down the acronym of SIEM into Security Information and Event Management. The very high-level of this type of tool is a solution that monitors assets and resources of all types throughout your entire environment for security information and events, and it pulls this data into a centralized location. This means events and information from firewalls, applications, databases, active directory, servers, and more are all collected and sent to one, centralized solution.

Why is this Beneficial?

Correlation - With all events and security information being sent to, and stored, in one location, activity can be easily correlated across the environment. For example, if you need to track a specific user account throughout the environment, you can head over to your SIEM solution and see where all this user account logged in from, where it logged in to, what activity was it involved with, and so much more.

Quicker Detection - Building off the previous point, the ease of correlation and automated analysis allows for a much quicker detection of potential incidents. To put this into perspective, one failed logon to one computer may not be concerning, but if you have a SIEM in place, you might see that there was actually one user who has failed logins to hundreds of workstations and critical servers in the last 20 minutes which now might seem like a much bigger concern.

Alert Prioritization - This is always a hot topic in Cybersecurity as to how one prioritizes the alerts, incidents, and investigations. A SIEM may be able to assist in this by providing additional information around alerts which can help you prioritize. For example, if there’s an alert for a high number of failed login attempts, you could quickly determine what type of account this is, where it’s coming from, if this is normal for that user account, and place a more accurate priority on its investigation.

Automated Responses - We’ll be looking into Security Orchestration, Automation, and Response (SOAR) with a future blog posting, at which time I’ll add a link to it here. SIEMs often have automated capabilities which can greatly assist you with investigating and responses. There are almost limitless possibilities when discussing automation in this space, but an example of a common automation is that when a user account alerts for unfamiliar sign-in properties, the user account’s sessions are immediately revoked which would terminate any active sessions threat actors may have for that user account. Not to say they can’t necessarily get back in, but you’re buying time and putting up obstacles until you can fully investigate and remediate.

Regulatory and Compliance Requirements - Depending on your organization, industry and location, there may be requirements around collecting and storing logs which can be addressed by implementing and maintaining a SIEM solution.

How Does This Work?

Let’s start off with breaking down agent vs agentless monitoring. With agent-based monitoring, there is an actual service or software running on the device being monitored and the central server pulls data from those agents which is why this can also be called the pull method. For agentless monitoring, there is not a service or software running on the device being monitored and the central server initiates the connection required for monitoring which is why this can be called the push method.

1	Data Collection Obtain from servers, network devices, applications, security tools, etc. via log forwarders/collectors/agents on devices, APIs, and event streams
2	Data Aggregation Pull all the data collected into a centralized location
3	Analysis Review all data, attempt to establish relationships and trends, identify anomalies and threats, and generates alerts
4	Review Analysis to Identify and Investigate Threats The alerts and analysis may provide indication of breaches, compromises, or other threats that require additional investigating

Data Collection

Agent-based monitoring is the most common method when it comes to SIEM data collection as it offers less chance for logs to be tampered with before they are collected and logs are filtered at the source so only the desired logs and events are collected which can save storage space. This does not mean that agentless has no place in SIEM monitoring, because it certainly does. Agentless operates without services or software installed, so it may be lighter on a device’s resources and it is common in cloud environments and applications where security information and events can be shared with the SIEM via an Application Programming Interface (API).

Data Aggregation

Aggregation is the next step in this process as data collection takes place from multiple assets throughout the entire environment and brought into one, centralized location. Regardless of how the data is collected, the purpose here is to get it all pulled into the same location so the next step can happen.

Analysis

This is where the magic happens as the SIEM reviews all of the data that has been aggregated. It looks for any relationships among the events and data, establishes trends, and tries to identify anomalies. Through this process, incidents and threats can be identified which generate alerts.

Review Analysis to Identify and Investigate Threats

Depending on the automation in place and the alert, it may be up to us to review the analysis, investigate further, and take action. For example, an alert for numerous failed login attempts followed by a successful one may require us to look further into this and review the source device which might indicate data exfiltration is just starting to take place.

Selection, Implementation, and Management Strategy

Plan

Before jumping right into selecting and deploying a SIEM, we need to determine what the parameters of the project are and let it help guide us to the right SIEM solution. Here are some helpful points that should be taken into consideration when forming your initial plan:

Determine why you want to implement a SIEM solution
- Is this for better visibility, to help meet regulatory or compliance requirements, want to better monitor specific activity, etc.?
- If you are doing this to help meet regulatory or compliance requirements, I highly recommend you read and fully understand the specific requirements around this so you can adequately meet them
Identify what you want to monitor
- Do you want to monitor only business critical servers, all servers, firewalls, databases, active directory, etc. ?
- If you do not yet have a full Information Technology inventory, this is where I would highly recommend putting one together
Take into account your organization’s strategy, vision, and future plans
- Do you plan on expanding to additional physical locations, move more into the cloud, grow your remote workforce, etc.?
Understand how long you should, or need to, store logs for
- Many regulatory, compliance, and cyber insurance policies have requirements that specify retention time for logs so be sure to check these items
Understand the differences between hot and cold storage as this may be referenced in the requirements.
- Hot storage (such as immediate cloud storage) is quickly and easily accessible whereas cold storage (such as hard disk drives stored offsite) is cheaper but more cumbersome to access.

Solution Provider Selection

Knowing the answers to the above points can help point you in the right direction as to what some potential SIEM solution providers are as well as greatly help when getting accurate cost estimates and performing the initial comparison. A few topics to consider at this stage are as follows:

On-Premises vs Cloud-Based Solutions

This is a topic that can be entirely unique to your specific environment and needs. They both have pros and cons, but we’ll hit a few of the main ones. On-premises solutions allow you to have complete control and ownership around this entire SIEM solution, but this also means you need to make room in your environment and have a little extra time to fully manage this. Cloud-based is often quicker to deploy, easier to scale, and more disaster tolerant, but you will likely not have full control over all aspects of your SIEM solution which can be a pro or con depending on your view.

The Financial Side of Solutions

Financials are always one of the top considerations when looking at new tools, but aside from the sticker price comparison, it might also be helpful to look further into how these solutions are handled from the accounting side of the business. Operating Expenses (OpEx) and Capital Expenditures (CapEx) may or may not mean a whole lot to you, but this could assist you with getting the proper stakeholder buy-in, following your organization’s preference for IT-related expenses, and help you select the most appropriate SIEM solution for your organization.

OpEx are typically expenses that are fully accounted for during the time the expense is incurred whereas CapEx are recorded as an asset and their cost is depreciated (aka spread out) throughout their projected lifetime. On-premises solutions are typically a larger upfront cost to get started with the potential for recurring licenses/software renewals after that which allows this type of solution to fall into the CapEx category. Cloud-based solutions are subscription based which means smaller upfront costs, but recurring payments on a monthly, quarterly, or yearly basis which puts this into the OpEx category.

Overall, do you want a solution with a larger upfront cost that’s viewed as a tangible asset that depreciates or do you want a solution that has a recurring expense and is viewed as a service rather than an asset?

The Next Steps to Select A Solution Provider

Once you have all of your parameters figured out for this project, it’s time to begin the vendor selection process. There are many paths to SIEM and unique environments, so the best next step would be to do some research into SIEM providers that offer a solution which fits your specific needs and identify what you think might be the top 3 providers for you. Reach out to each of the 3 providers, get in contact with their sales team, and start scheduling meetings. These will start off as simple meet and greets, getting to know your environment, wants, needs, and all of the topics previously mentioned in this article.

I highly recommend that after each vendor provides you with a demo of their solution, request a Proof of Concept (PoC) to be conducted within your environment. An excellent test is to have them tie into your domain controller and setup their default alerting. From here, battle test their solutions through as many identity-related events as you can. Things such as numerous failed login attempts within a short amount of time, anomalous travel by signing in from multiple locations with a VPN, and other activity that could indicate malicious activity.

Implementation/Deployment

As with all things, have a plan.

Start Small

The goal here is to start small, such as with just one system that’s not business critical in case things go awry. This will allow you to setup your SIEM instance, get familiar with the deployment process, and begin understanding this new tool. Once you get comfortable, you could begin deploying to a few more devices, preferably to different types of assets such as a domain controller, a database, a firewall, and an application.

Establish a Baseline, Begin Fine-Tuning, and Start on SOPs

The reasoning behind deploying to a device or two of each type is that it allows you to see all of the logs being ingested from each type of asset without being completely overwhelmed with data and logs. Here is where you can start to establish a baseline of what is normal activity, but you can also begin fine-tuning log collection and alerts. You may find that you’re collecting too many irrelevant logs that are just creating extra noise for you or you may find that you’re not getting the events you wanted. Now is a great time to start setting up custom alerts if you want to keep a closer eye on certain activities or events than what came pre-packaged.

This is also an excellent time to start working on creating Standard Operating Procedures (SOPs) and Playbooks. Since you’re starting to see events, customizing alerting and seeing their default alerting, you will have a pretty good idea of what types of actionable information you will get out of your SIEM. Take those actionable alerts and create documentation of how you or your team should respond to them so that when the time comes to respond, you can be sure it’s a thorough, complete, and quick response.

Continue the Deployment Until Completion

As you see fit, continue deploying agents and onboarding assets to be monitored by the SIEM. I highly recommend doing a phased deployment so that if you do happen to run into issues, the effects are isolated and can be quickly rolled back. When people are unable to do their work because of issues caused by technology, they tend to get frustrated and can hold that frustration for some time, so the smaller the potential impact on your organization, the better. In theory, deploying agents and onboarding assets with the SIEM should not have any user-facing impacts, but those are the famous last words of many.

Regular Reviews

Setting up regular reviews around tools, processes, and documentation should be a standard practice in this field because it rapidly changes. At least once per year, set some time aside to review the logs, understand what is deemed “normal” activity, and ensure SOPs and Playbooks are still accurate and complete. There may have been a change in the endpoint detection and response solution at your organization and updating these Playbooks might have been missed or perhaps there was a shift from on-premises file storage to SharePoint and OneDrive which would change what your organizations’ “normal” traffic looks like. There can be many changes that can affect a SIEM, so a regular review helps ensure it is properly maintained.

Helpful Notes

I mentioned previously that SIEM solutions come with a set of default alerts which may or may not be fully adequate for your environment, and this is where custom alerts come in. With SIEMs, there are many opportunities to stand up custom alerts but I caution against putting too many in place as it could cause alert fatigue. It may be a bit of a journey to find the right set of alerts for your environment that gives you the insight you need without causing alert fatigue.

Alert on Disabling Real-Time Protection

Often times, out of the box SIEM solutions do not monitor or alert on Windows Event ID 5001. According to Microsoft, this Windows Event ID is generated when real-time protection via Microsoft Defender is successfully disabled. If you use Microsoft Defender as your primary solution, this would be highly beneficial to monitor as it can indicate abnormal behavior specifically targeting endpoint protections. This could be a rogue user, malware, or other causes, most of which are not legitimate.

RDP Login Attempt Alert Frustrations

One frustration I had encountered was when trying to detect failed authentication attempts with the Remote Desktop Protocol (RDP). The scenario is that a user account is attempting to establish an RDP session with another host and has the correct hostname and username, but a bad password. This was not generating the Windows Event ID 4625, so I was not being alerted to failed RDP attempts with the set of alerts I had in place.

Remote Desktop Protocol (RDP) uses the Credential Security Support Provider (CredSSP) to securely delegate user credentials from a client to a target server for remote authentication. Here's how it works:

Initial Connection: When a client initiates an RDP session, it first establishes a secure channel using TLS (Transport Layer Security).
CredSSP Authentication:
1. The client and server (aka target device) negotiate the use of CredSSP for authentication over the secure TLS channel.
2. The client uses CredSSP to securely package the user's credentials and send them to the server.
Delegation of Credentials:
1. CredSSP allows the client to delegate the user's credentials to the server. This means that the server can use these credentials to authenticate the user against an authentication provider (like Active Directory).
Server Authentication:
1. The server uses the received credentials to perform the actual authentication using either NTLM or Kerberos, depending on the environment and configuration.
2. Kerberos is the preferred mechanism for authentication with NTLM as the backup mechanism.

The problem I was running into with not being able to alert on failed RDP authentications was because Kerberos is the preferred mechanism for authentication here. This means that when the server receives the credentials, it passes them off to the authentication provider like Active Directory (which is on a domain controller). If it fails authentication, it is failing authentication on the domain controller, not the target device. Due to where the authentication failure happens, the event ID and location of this event can vary.

In my case, Kerberos was being used which means the failed authentication event was on the domain controller and it was actually Event ID 4771. If Kerberos was unavailable, CredSSP would then leverage NTLM as the backup mechanism which means the target device would then log Event ID 4625.

Monitoring Privileged Groups

I wanted to stand up a few custom alerts around privileged groups such as a member was added since this can indicate elevation of privileges as well as just generally keeping tabs on these groups. The part that I didn’t realize when first undertaking this task was that there are specific Windows Event IDs for each action on each type of group. For example, there is an event ID for adding a new user to a local privileged group, another event ID for adding a new user to a global privileged group, and another event ID for adding a new user to a universal privileged group.

As a result of this, I created 3 separate alerts that encompassed the multiple event IDs which consist of the following:

‘User added to privileged group’ that monitors for event IDs 4728 global, 4732 local, 4756 universal
‘User removed from privileged group’ that monitors for event IDs 4729 global, 4733 local, 4757 global
‘Privileged group created, modified, or deleted’ that monitors for event IDs:
- AD group created: 4731 local, 4727 global, 4754 universal
- AD group deleted: 4734 local, 4730 global, 4758 universal
- AD group modified: 4735 local, 4737 global, 4755 universal

For a list of the built-in and default accounts and groups in Active Directory, I recommend going straight to the source which is a Microsoft Learn article here: https://learn.microsoft.com/en-us/windows-server/identity/ad-ds/plan/security-best-practices/appendix-b--privileged-accounts-and-groups-in-active-directory