05 July 2017 Engineering

The Future of Data: Big, Fast, Ubiquitous

By Dimitri Marx

Editor's Note (September 7, 2018): This post refers to X-Pack. Starting with the 6.3 release, the X-Pack code is now open and fully integrated as features into the Elastic Stack.

It would be hard to find many modern businesses with diminishing volumes of data that they need to process. Most businesses now also face a requirement to process data faster and deliver more, actionable insights in near realtime.

Innovative businesses are always drowning in data that they would love to be able to make sense of and extract valuable insights from. This challenge is unlikely to ever go away: as our ability to handle data increases, so does the human desire to explore and find new sources of data to search and analyse.

As a part of this, large-scale data analysis is moving into more and more new areas of business. Senior executives, sales, marketing, customer relations, finance, operations, logistics and nearly all areas of the modern business, now have access to growing troves of data, from which can unlock valuable competitive advantage, improve existing business processes, and build new applications.

Data analysis is moving from being the walled realm of the data scientist, to becoming everyday business tool. It is becoming so ubiquitous in modern business, that it has become normalised. This poses new challenges for the development of data analysis tools and software. Increasingly, the task is to make the highly complex technologies, simple and intuitive for an ever increasing number of new end users. Usability, not just scale, is become a vital attribute.

Search at the Core

Data analysis, whether or not we call it "Big", all boils down to the power of search. Individuals are looking to gain insights and knowledge from data. A decade ago, mentioning "search" to an average user, very few would have immediately grasped all the possibilities of what can be achieved with it. But, open source technologies like Elasticsearch have made it simple to map any new problem domain model to "search" and have made crossing this mental barrier much simpler.

At Elastic, we have seen an explosion of use cases where our technologies are being used to power "not your typical search use case." Our users keep on finding innovative ways to use Elasticsearch, and it is perhaps one of the hallmarks of a great open source project, that it allows users to reach a level of creativeness that they initially never even imagined.

"Big" data is, almost by definition, heterogeneous. The name "Elasticsearch" is a reference to this flexible combination of free text search, structured search, and analytics. It should not matter if the data is your typical web page/word document, or to a degree, a location on Foursquare, a trade in a bank, a web server log, or a metric of sorts. All effectively are a combination of structured and unstructured data that people want to explore, search through, regardless of the shape or volume of the data. Although the data content itself is interesting, if search works, to a degree, the data becomes irrelevant.

The Future of Search

If we look at all the trends in enterprise data analysis over the last couple of decades, these have largely been driven by advances in search technology, to enable new, more powerful uses of search. The ability to search for correlations across ever more dimensions or facets of a dataset, to search unstructured data, or just searching ever greater volumes; new data storage and indexing technologies have certainly played a part, but largely to enable new, more powerful forms of search.

The latest data technologies such as Graph analytics and machine learning, are essentially more sophisticated applications of search. Graph analytics allows the user to search for new connections in the data, independently of the need for structure in the underlying dataset. In a world where technology offers almost overwhelming search possibilities, this provides a faster, more powerful way to explore data and unlock important trends and relationships. Graph provides a form of meta-analysis, searching for what trends warrant deeper analysis or ongoing monitoring.

Even machine learning has a search requirement at its core. Search technologies have long been used to look at the behaviour of data over time and identify key indicators for significant events. A common example is in IT operations, where analysing historical application, server and network logs is used to identify indicators for impending system failure.

In the past, companies have required skilled data scientists to build statistical models and define thresholds for each indicator they identified. This has been a complex and laborious task, which has still, often resulted in a high level false positives when using the models to monitor live data.

The result is that behavioural analytics has been limited to large mission critical data centres and high-return areas such as financial trading. But machine learning and, more specifically, behavioural analytics tools are rapidly increasing in power and capability, with the ability auto-generate Machine Learning models with a far higher degree of accuracy. Software tools can now provide organisations with the type of analysis capabilities that, just three or fours years ago, would have demanded a team of postgraduate data scientists. And this is opening up incredible new search-based applications to all areas of the business.

In all businesses, the volumes of data requiring analysis will never shrink. But size is only one dimension: "Big" is becoming, bigger, faster (real-time) and predictive; analytics is acquiring its own ability understand and learn from data; and all of this technology is being put in the hands of a far greater number of users across the enterprise.


Dimitri Marx:

Dimitri joined Elastic as a Solutions Architect based in Augsburg, Germany. Graduated in Computer Science, Dimitri has been working with open source technologies for the last 10 years developing software, delivering consultancy and helping build solutions around open source products both in the search space and beyond.