Customers

WÜRTHPHOENIX NetEye: Our Elastic Stack Story

Wuerth-Phoenix_large.png

This article was published on JAXenter.com on January 27.

Market Requirements – Why Log Management?

It all began with the new decree (the "Garante per la protezione dei dati personali") issued by the Italian data protection authorities in 2008. This regulation (Italian | English) stipulates that all companies must log all administrators' system access data and keep them archived for at least six months. This approach is intended to facilitate and standardize the monitoring of system administrators' activities and, above all, protect sensitive company data. In other words: Security Auditing.

Carrying out a detailed analysis of the requirements stipulated by the data protection authorities allowed us to identify the following four categories:

  • Logging and compilation of events (log on, log off, failure authentication) in a heterogeneous system environment (Windows, Unix, Linux; such databases as Oracle, DB2, MySQL, Lotusnote, SAP, Firewall, etc.)
  • Central storage of events in all systems, databases, and network devices
  • Indexing events to permit fast searches and the detection of anomalies
  • Enabling auditors to carry out targeted searches for specific attributes (e.g. host name, user, time, etc.)

In 2010, we were therefore faced with the challenge of providing our customers with an appropriate solution for all these categories within our NetEye IT Systems Management solution.

Should we develop our own agent.png

Existing Options – What was already available in the world of open source?

As our NetEye monitoring tool is based on a number of open source modules, it seemed a natural first step for us to take an in-depth look at the existing options in the world of open source. We wanted to find out about the tools that were already available for gathering log data. Our research revealed that Snare and Epilog were suitable for collecting logs. This combination presented some limitations, however:

  • Reliable, secure communication was not guaranteed (TLS-TCP, SYSLOG RELP problem)  
  • It was not possible to implement agent-side filtering (which only collects those events that are actually required so as not to burden the network and the server unnecessarily)
  • No central, simple way to configure both agents on different operating systems
  • Monitoring of the agents themselves
  • Auto-discovery of administrator roles was not possible in Windows
  • The installation of two agents (Snare and Epilog) was unnecessarily complex

After analyzing these weaknesses, we came to the conclusion that we were not prepared to put up with such shortcomings. We therefore decided to develop our own agent. The SAFED (Security Auditing ForwardEr Daemon) agent we created was based on Snare and Epilog and was made available to the community as a new option for collecting logs. [Github - Safed]. We decided to use rsyslog to capture events in Linux. In the first version, we settled on Solr from Apache for the indexing of logs. We also developed our own interface to search logs via Solr. All this meant that we were extremely well equipped to face the demands of 2010.

Entwicklung des neuen Safed Agent.png

Fig. 2 SAFED agent

Added Value – We want more!

Although our solution enabled compliance with the prescribed directives, it did not present any particularly large advantages for IT management. We were therefore very keen to develop our solution further so that it would provide our customers with additional benefits in terms of IT service management.

Customer feedback allowed us to get to know the more sophisticated requirements in the world of log management and security information and event management (SIEM). Before we knew it, our task was no longer just about gathering and archiving logs. Instead, we were faced with a new list of demands:

  • Data Aggregation: Aggregation of data from varying sources (network, security, server, databases, applications)
  • Correlation: Detection of common attributes for the bundling of events
  • Alerts: Automatic analysis of correlated events to send out notifications
  • Dashboards: Use of dashboards to display event data
  • Conformity: Reports on the fields of security, governance, and auditing
  • Retention: Long-term storage of historical data

The combination of the SAFED agent, rsyslog and Solr was no longer sufficient. We thus began searching once again for suitable tools to adapt NetEye to the new market requirements. We came across the Elastic Stack in January 2014. Our developers spent a good amount of time evaluating it and detected an opportunity to expand NetEye into a fully-fledged log management solution with the help of the Elastic Stack.

We elected to integrate the Elastic Stack into NetEye. The main reasons for this decision were:

  • the simplicity of implementation
  • the Logstash parsing tool
  • the cluster capabilities of Elasticsearch
  • the scalability
  • the use of Kibana as an interactive dashboard

[To be exact, in the interim we used Grok as a parser, which was technically quite complex. When Grok was integrated into Logstash, it became easier to use. This was a further argument for the integration of the Elastic Stack.]

Our web search interface was replaced with Kibana. Solr gave way to Elasticsearch. From that point on, we used Logstash as a log parser. In addition, Elasticsearch allowed us to carry out aggregation and indexed searches. 


Integrated Open Source Projects EN.png

Fig. 3 Elastic Stack Integration

The only thing we still lacked for the essential SIEM functions was an event handler, an element that reacts proactively to event inputs and, depending on the type of incident involved, triggers a specific action. We developed the NetEye Event Handler to gather Syslog events, e-mails, SNMP traps, and SMS messages and assign appropriate actions using a "rule matching engine". (You can find more details on the NetEye Event Handler by visiting our blog).

The Result – We're proud of what we've achieved!

Our in-house developments and the integration of the Elastic Stack resulted in the release of NetEye 3.5 with a comprehensive log management module in 2014. This tool gathers, indexes, and aggregates events. It also enables individual searches and can react to all events automatically. We are pleased with the result and are happy that our clients no longer have to limit themselves to following the data protection directives. Instead, they now have all of the advantages of a comprehensive log management system at their disposal.

Drill Down EN.png

Fig. 4 Kibana Dashboard

The Future – There's still a lot to do!

Software Metering

We realized that there are a great deal more fields of application that we can cater to using the foundations we have created. We currently have customers requesting the ability to display application metering. In more specific terms, this refers to software metering using a Citrix Farm. We use our SAFED agent to gather events for this purpose (application start – end per user). The events are then stored in Elasticsearch and presented via the Kibana dashboard. This example underlines once again how flexible NetEye log management is when based on the Elastic Stack.

Network Performance Monitoring

In the field of Network Performance Monitoring, we measure the usage of networks by gathering NetFlow data with the help of our nBox appliance. There is also potential here to store the collected NetFlow data directly in the Elastic Stack. Logstash is capable of receiving NetFlow v5 and v9 (we had to make a few improvements to NetFlow v9, but these were relatively uncomplicated). NetFlow dashboards can be presented in Kibana 4. As the Kibana dashboards are so powerful in terms of navigation and aggregation, they present the IT world with a plethora of possibilities.

This is where the next group of challenges comes knocking at our door. How well can Elasticsearch deal with millions of NetFlow flows? We have deployed our network probes in 10G networks where millions of such flows arise every second.

Performance Data at the I/O Level

A further project we want to address is the collection of performance data in a VMWare ESX environment and in SAN systems at the I/O level. In this project, we will select data from the VM using VMWare SDK, Datastore and Lun, and thereby identify the Top Talkers of the VM(s). This allows an admin to see immediately which VM is generating an extreme amount of I/O load on the SAN. Should several SANs (and also various manufacturers) be present, the admin won't lose track and will be able to find an answer to his/her question in a reasonable amount of time.

Summary

To conclude, I would like to say that we have only begun to scratch the surface of the potential applications of the Elastic Stack within NetEye. It will be exciting to see which market demands can be met using the Elastic Stack in the future. New possibilities and fields of application for such a flexible technology are also constantly emerging in the community.


Georg KostnerGeorg Kostner has more than 17 years of experience working within the Würth Group in the IT System Management and Software Development sectors. At the beginning of his professional career, he was involved in the implementation of ERP applications and framework developments. To this day, he remains dedicated to innovative technological research projects and, above all, retains an interest in Open Source software. As the head of the System Integration department, he is currently responsible for WÜRTHPHOENIX NetEye and WÜRTHPHOENIX EriZone, solutions that have been developed in-house.