15 March 2017 User Stories

Found it! Document Retrieval at ARQUO with Elasticsearch and Elastic Cloud

By António Sargento

In this post, I would like to share my experiences of using Elasticsearch and Elastic Cloud on an ARQUO project and the main reasons why Elasticsearch was the right choice for us.

ARQUO is a cloud document and business process management platform that helps Enterprises and BPO service providers quickly and efficiently implement document processing projects (mail-room, purchase order workflow or expenses management).

Document Storage & Retrieval

One of the most important components of this platform is the digital archive. For a digital archive like ours one of the major requirements is to be able to store information (documents or digital assets) like invoices, purchase orders, photos, audio or video files and retrieve them when necessary.

1.jpg

Free text search: searching for photos of Cristiano Ronaldo.

Document retrieval can be challenging if the information about the digital assets (metadata) is not correctly organised and indexed. In order to help our customers search useful information on those documents stored and find what they're looking for we needed a search engine to help us cope with that task. A search engine that can index and organise the document metadata as well as if necessary the document content (OCR information).

2.jpg

More structured search, where a range search was performed (search for all expenses of amounts ranging between 25€ and 100€).

Elasticsearch and Elastic Cloud to the Rescue

In the past, we used Lucene on other projects (on-premise, without SaaS requirements). But for ARQUO, the requirements are not the same and we needed to find another tool that provided us with a distributed, multi-tenant, full-text search engine with a REST interface and schema-free JSON documents. In addition it was a plus for us if that tool was offered as part of a SaaS model since at ARQUO we prefer to use it in a managed service fashion rather than deploying it ourselves.

This is when we started looking at Elasticsearch since it matched with all our requirements and Elastic Cloud, originally known as Found, was just starting their managed Elasticsearch offering providing clustering, security and management capabilities, as well as simple cluster provision and management.

Features that come with Elasticsearch like the full-text search capabilities, filtered queries, custom aggregations and the responsiveness are all important for us because we work with unstructured data. These capabilities allow our customers to search across the data faster and because the information is more organized through the aggregation feature it will also help them to find the required information more easily.

The available tooling is also very important for data analysis and management.

We signed up for the Elasticsearch hosted service as early adopters and beta-testers with our first cluster created in May 2013, and until now we are happy customers, first with Found and now with Elastic.

3.jpg

Information aggregated by person shown in image.

The Future is Bright

For the near future we expect Elastic to be evolving continuously by adding new capabilities and new features, which will allow us to integrate them into our architecture. One of the Elastic capabilities that we would like to make use of in the near term is the log analysis. We are aiming to improve the way our event logs are being handled by using Logstash and Kibana.

We have no reason to change to another service provider, even though there are many other players out there (Qbox or AWS Elasticsearch, for instance). Elastic Cloud has proven to be the best alternative for us since it's always running on the latest Elasticsearch version, it's got a good set of tools, a compelling model for a good price and a great pool of experts. Speaking of which, I would like to leave a couple of final words for the support guys:

Keep up the good work! You've always been available to help despite the rather difficult problems I had to solve. I really appreciate your work and expertise. Please continue exactly this way.

Bio:

Antonio Sargento
António Sargento is a Chief Architect at ARQUO and has been designing and developing software professionally for about 25 years. As a distributed computing and cloud enthusiast, António has started a new challenge over the last 5 years: design and develop cloud software.