Enterprise log managers: An unsexy but vital tool
Ultimately, the goal of Enterprise Log Management (ELM) is to get your most critical events escalated to your operations staff to react and respond with the appropriate actions. In today’s enterprise, you would be culling through millions of events if you were not relying on ELM to correlate that information and point to what is most critical.
You may be asking, “Isn’t this Security Information and Event Management (SIEM)?” It’s not. Well, not entirely. ELM and SIEM are interrelated. SIEM is more concerned with the larger view of your overall security landscape, whereas ELM is focused on a specific element of security: “What is happening where?” SIEM correlates data across varying data sources and environments—a more holistic view. Therefore, ELM is a subset and critical component of a SIEM program. Not all companies require a SIEM program. However, most companies would benefit from an ELM solution.
Corporate policies are put forth, as are the related controls, in an effort to deter or prevent undesirable activities. Translating the corporate policies into the solution and configuring the relationship between the policy, the controls, and the data feeds from systems and applications that need to be monitored are foundational steps to build an ELM. A measure of the quality of an ELM technology is how easy it is to interface with your critical systems. “How many different components does it understand?” so to speak. “How much technical expertise is required in order to make it deliver value?”
Use cases and setup
Privileged access monitoring is a classic example in which an ELM gathers logs from various systems and creates a direct workflow to the operations staff, enabling them to take an action against items considered inappropriate.
For example, a domain admin logged in after an allowed change window and failed to authenticate several times in a row—an example of a potential brute force attack.The system must correlate those events and initiate the appropriate workflow, whatever that may be. The processes established around the solution are just as important. The log management solution is only as good as the processes and teams that support it. Typically, this requires an engineering staff and an operations staff. The engineers build and configure the ELM so the right alerts are coming through. The operations staff is then able to take the alerts and, ideally, do the “right thing.” Of course, the less mature your existing processes and workflows, the more iterations will be required. The events you consider “taggable,” the events you are interested in, must tie back to corporate policy.
The basic premise that “thou shalt not access that which you are not allowed to access” will guide the rules you develop. Activity will fall into one of three categories: transactions you don’t care about, transactions you want to know about and transactions you want to take immediate action on. For example, you might have miskeyed your password while attempting to log in. That type of transaction is not necessarily one to be concerned about. However, if there are a thousand more attempts in the next 60 seconds, you should know something is phishy. This example is likely a hacker trying to brute-force access to your valuable data. Flag it and determine what part of the organization should receive the system workflow.
ELM can provide value through non-security use cases as well. There could be transactional activity that indicates a problem, such as multiple acknowledgement requests being generated as a result of a system glitch. The sheer volume could saturate the network acting as a denial of service attack. The ELM could flag this type of activity when it occurs so that remediations can begin to happen in a preventive manner, potentially averting an outage of a critical service.
A virus on the network provides an opportunity for a good ELM to demonstrate intelligence. As the tool logs virus-induced events and correlates them together as a single outbreak, operations will be able to target the affected population proactively. This approach, as is usually the case, can save hundreds or thousands of hours by solving the problem instead of addressing each incident reactively. Obviously, this becomes a compelling value statement as ITIL has put forth for decades: the presence of multiple incidents occurring for similar reasons typically represent a problem needing a solution (i.e. “problem management’).
Requisite skills
The primary skill associated with successfully deploying an ELM is being able to translate business use cases into the ELM tool’s language. If your environment deals with personally identifiable information, for example, privacy concerns are going to be one of the highest priorities. An understanding must exist of the systems generating the data and how those data relate to the company’s use cases. For example, we don’t want people logging on as a local administrator in an Active Directory domain environment; therefore, the ELM would need to alert on the appropriate event ID. As IT professionals, we know there will always be a technology that is not commonly known and will require additional work to develop the proper interface. The resources you assign as your solution delivery Leads or engineers for an ELM deployment must understand how to translate your business logic into the technical speak of your IT landscape.
Challenges
Scalability is the first challenge and biggest concern in architecting the solution. Most likely there will be significant amounts of data logged. Data retention policies and growth must also be considered. Depending on your use cases, large portions of data may need to be held for very long periods of time. Therefore, consideration should be given to balance your company’s tolerance for risk with their taste for capital investment.
ELM systems typically work one of two ways: data intensive, which gathers all data to be analyzed later and thus need to scale accordingly; and limited collection, which has agents gather only the information considered “interesting.” In the case of the former, storage will be a greater concern; for the latter, processing capabilities will need to be stronger to reduce the chances of introducing latency into transaction processing time.
Many ELM solutions do not use a communications protocol that provides delivery guarantee, and instead use protocols, such as UDP, which can result in some of the data getting lost. Technology and process verifications could be additional requirements to be factored into the design.
Of course, having well-defined expectations will determine the perceived success of the implementation. Implementing such a solution in a company that has limited policies and procedures will have little success, as there will be few rules to correlate the activity against. Define your solution delivery success criteria early and make sure what you choose is measurable. Consider using a governance and management framework such as COBIT 5 to guide the initiative.
Conclusion
Some ELMs come with standard rule sets that can accelerate implementation. Recognizing efforts to refine rule sets to reflect your organization’s corporate policies will drive the migration from focused manual intervention to true problem management. In this manner, not only will ELM implementers see a reduction in time spent resolving incidents, but their responsiveness will be seen as more proactive than reactive. As a result, these shops should see a reduction in incident management costs. And of course, when implemented correctly, security issues will reduce overall and compliance abilities will improve.