How do you build open data portals for your customers to share hundreds of millions of records?
By building your platform on top of Elasticsearch to leverage performance and advanced search functionality
Case study highlights
Provide high performance
- Scale to hundreds of millions of records per customer
- Gain indexing speed for near real time data
- Deliver search query results in less than one second
Gain a competitive edge
- Offer advanced search functionality such as geo-clustering and analytics
- Leverage openness for development agility
- Accelerate time to market for new features
Searching hundreds of millions of records
OpenDataSoft is a French-based provider of tools to publish open data on the web. OpenDataSoft customers like the City of Paris use the OpenDataSoft Software-as-a-Service (SaaS) platform to share data with partners, developers and the general public. OpenDataSoft provides a standard data portal, as well as APIs that enable developers to integrate the data into their applications. A single OpenDataSoft customer can have hundreds of millions of records, so search is a key functionality of the solution, enabling the end-users to find the right data.
"Before Elasticsearch, we were using a different type of search engine, and there were several challenges," recalls David Thoumas, Chief Technical Officer at OpenDataSoft. "The previous solution could not provide near real-time search or the openness we require, because it was proprietary."
Gaining advanced search features
OpenDataSoft switched to Elasticsearch as the search engine for both the standard data portal – a simple tool enabling anyone to explore the data provided by OpenDataSoft's customers – and the APIs for developers.
"We use Elasticsearch as the main back end," Thoumas explains. "For the standard data portal user interface, Elasticsearch provides the search function. The end-users can search for the dataset by the metadata. Once they find the right dataset, they go to a dataset landing page where they can search by the fields. Or they can navigate by Elasticsearch facets."
OpenDataSoft provides geo-search in addition to basic search features. If a dataset contains geographical information, the end user can display the results of the query on a map. On top of the basic geo-search, OpenDataSoft also implemented a geo-clustering algorithm to make it possible for anyone to display millions of points on a single map.
"Near real-time was key to us, and Elasticsearch gives us what we need to build that set of features in our solution."
For developers, OpenDataSoft provides three APIs, all built on top of Elasticsearch. First, the Catalog search API allows developers to search a dataset catalog. Second, the Geo API is used to build clustered results that can be displayed on a map. Third, the Analytics API is used to retrieve time series data.
"One of the great advantages of Elasticsearch is the openness of the solution," says Thoumas, "especially compared to a proprietary solution. There is a range of business logic available as Elasticsearch plug-ins. For example, what we did with the geo-clustering API would not have been possible without the access to the plug-ins in Elasticsearch. Same for the analytics tools. Our analytics features could not have been possible if we did not have Elasticsearch."
Meeting massive growth with scalability
Currently, OpenDataSoft's largest customer has 50 million records, but they expect customers to go far beyond that in 2014. The system has been designed to be scalable so a single customer can go up to hundreds of millions of records.
OpenDataSoft is expecting major growth because the company is preparing to launch a worldwide coverage campaign, expanding operations which are currently focused on France and other parts of Europe. The objective is to have a platform robust enough for any customer in the world by end of 2015. By using Elasticsearch, OpenDataSoft has been able to create that robust platform and glean actionable insights from their ever growing set of customer data.
"OpenDataSoft's growth is based on the number of customers and on the success of the customers in their open data projects," Thoumas says. "Thus, horizontal and vertical scalability, both at the heart of Elasticsearch's design, are key for us."
"We are a SaaS provider with a multitenant approach, and we are reducing costs by federating customers in the same deployment space, so the scalability and elasticity of Elasticsearch are a key part of our business," adds Thoumas.
Accelerating indexing and search speed
"Performance is a key point for us in two ways," Thoumas outlines. "First is throughput on indexing time. Since OpenDataSoft works with utilities and transportation companies, handling large volumes of data in near real time is key. The way Elasticsearch provides near real-time indexing is very important to us. With its lightweight indexing approach, Elasticsearch is the perfect fit for data preparation while still keeping near-real-time latency."
"Second is search speed. Elasticsearch makes it possible for us to run geo-clustering queries on 3 million points of interest in less than one second. This is important to us because this is the kind of feature we want to provide to anyone using our solution, even the general public. Thanks to this ability from Elasticsearch, we are able to provide any user with tools, such as geo-search, that would otherwise be way too costly to produce."
"We use Elasticsearch analytics features to display time series or any kind of series data on a chart."
Gaining a competitive edge with Elasticsearch
"Agility is essential for a startup operating in the data management area like OpenDataSoft," Thoumas points out. "Agility is key, especially for our positioning as a small startup in such a gigantic space – agility for us to quickly produce advanced features. Elasticsearch makes it possible to anticipate user needs as well as to react to customer challenges, thanks to its openness and elasticity. Elasticsearch makes it possible for OpenDataSoft to stay ahead of the competition by quickly bringing advanced features to market."
"Elasticsearch is a very powerful asset, helping us to compete in a very competitive market," Thoumas concludes.
High performance search
Elasticsearch provides OpenDataSoft with high performance via fast indexing for near real-time, and fast query response time in less than one second.
Elasticsearch's geo capabilities fuel OpenDataSoft's advanced geo search features, which are key capabilities for the markets OpenDataSoft serves, such as municipal governments and transportation.
Scalability and elasticity
OpenDataSoft is better equipped to handle massive growth with Elasticsearch, enabling customers to offer access and searchability of hundreds of millions of records through their open data portals.
Elasticsearch provides OpenDataSoft with the development agility to quickly bring new features to market, strengthening the company's position in a competitive market.