17 June 2015 User Stories

Elasticsearch: Powering Real-Time Mobile and Web Analytics for LeadBoxer and Opentracker

By Leslie Hawthorn

Cralan Deutsch and Wart Fransen are co-founders of Netherlands-based LeadBoxer and Opentracker. LeadBoxer is an actionable tool for B2B sales agents which delivers qualified leads via real-time signals, based on proprietary customisable lead scoring technology. Opentracker specialises in web tracking, data analytics and statistics innovation, while its hallmarks are simple, intuitive, and easy-to-read reporting interfaces, combined with an enterprise-class API.

Opentracker.net is an analytics company focusing on website and mobile app user behavior. Today, our 1000s of clients from webmasters to corporate, financial and governmental institutions have become addicted to seeing exactly who, and how, people are using their online services in real-time, letting them track users, improve client experience, and increase conversion rates. Read on to learn how Opentracker.net and our experiences with Elasticsearch led us to create LeadBoxer, a sophisticated web and mobile based lead capture and qualification system.

In our early stages, like a lot of other companies founded during this period, Opentracker’s technology was built around MySQL. While providing a great starting point, MySQL suffered from inherent bottlenecks relative to scaling and structuring big data. As NoSQL solutions began appearing, we explored them and eventually settled on Cassandra, which resolved many of our data maintenance, redundancy, storage, and scaling issues. The one thing it did not provide us with, however, was real-time search.

Real-time searches are important because every client has specific insights they want to extract from their data, and they need the flexibility to find exactly what they need to improve their business. Some clients are interested in users from certain locations, so that they can focus their on-boarding efforts by sales region, while others are interested in finding users on conversion pages that pre-qualify the visitors with interest in a specific product or service. Others are looking for users that match a specific lead qualification criteria in order to increase the chance of making sales, allowing sales agents to target potentials more effectively, and make warmer calls.

In order to provide this functionality, we experimented with Apache Solr and Elasticsearch solely for the purpose of running real-time searches on both web and mobile app user data. Elasticsearch worked smoothly and presented an easy learning curve, while its schemaless structure allowed our clients to search through high volumes of unstructured data with flexibility, stability, and good response times.

After implementing Elasticsearch, we started getting the same request from our B2B users: "Show me the companies who are visiting, and help me get in touch with them." An opportunity presented itself: to develop a new service that could provide intelligence in the process of generating and capturing leads. We branched the technology that targeted the identification of companies, and LeadBoxer was born.

LeadBoxer’s value proposition? Identify sales leads from online activity, then qualify the leads based on metrics the client finds important. (You can check out the service at leadboxer.com by setting up a trial account.)

After implementing Elasticsearch in Opentracker, we realised we could use it to power LeadBoxer’s basic functionality by manipulating the scoring and ordering algorithms. Our guess was that we could utilize the function_score and boosting queries to help sort and rank leads by importance.

At the time, documentation on the subject was limited and distributed through many different community channels, so we started to develop our own API to query Elasticsearch in order to create a lead scoring system that could be customized and generated by our customers.

We were taking a risk. We thought that search engines were designed for searching through docs, not qualifying sales leads! The big question was would it work?

The short answer, is yes.

We now use Elasticsearch to deliver useful, qualified, and beautifully-designed lead results to sales and marketing teams. Our customers can adjust sliding weights to influence a lead score preference, and the preferences are converted into percentile values. The values are stored and converted to a formula in the API call, which in turn translates the formula to an Elasticsearch query. The Elasticsearch query returns the leads, ranked and prioritised based on the formula. Changing the weighted preferences gives different leads, allowing sales teams to focus on leads that they want to target.

So, how do the lead score settings in the LeadBoxer UI translate to values? By using the AngularJS framework, we built a settings page where our users can set their scoring preferences with sliders. The values that are set by these sliders can range from 0 to 20.

We decided that our lead score value should be a number between 1-100, 100 being the perfect lead. Because we keep on adding sliders and ways to influence the lead score, we need to make these absolute numbers percentiles. We achieve this by use of a formula:

Formula value = ( value in the slider x maximum_total_ie_100 ) / total_of_all_slider_values ) * 100

We use boosting/scoring in different ways to calculate match, criteria and range. We defined three different property types that could be used to influence the lead score:

  • contains: when the enriched visitor (lead) contains any value for field X, add 10 points to the lead score
  • exact match: when the lead contains a value X for field Y, add 20 points to the leadscore
  • range: when the number of pageviews of a lead is between 5-10, add 5 points to the lead score

These values are then translated into a formula by storing the UI-generated percentiles as URL parameters, e.g. ?criteriaMatch=industry|Insurance|10 which designates 10 leadscore percentile points to any lead identified via company profile as belonging to the Insurance sector.

These API calls are then translated to an Elasticsearch query by processing them via a Java execution code path. For the Elasticsearch client, we are using a transport client. (And you can read more about the Elasticsearch Java DSL here.)

We store and index our data by collecting and sending user, session, and event data to our cluster of ‘log’ servers. In turn, the data is processed and enriched before it is stored to both the index of the Elasticsearch cluster and a Cassandra node.

We learned a lot when using all this technology. The information is documented, but we couldn’t find any case studies or examples, leading us at the time to conclude that this powerful technology is underutilised. 20+ million downloads later, though, means lots of people have figured out how useful Elasticsearch can be for their businesses!

The best part about Elasticsearch is that it's a search engine that lets you manipulate the ranking/returns, meaning that you can be your own search engine master, and determine the outcomes with which the results are ranked and therefore presented to the user. Bottom line: using Elasticsearch in this way, you can let the user customise the results, meaning the results from any system-wide search, i.e. the order of results, will be based on custom criteria.

For those of you who are fans of the nitty gritty details of our installation, our cluster consists of 4 master/data nodes with a monitoring node. The cluster is using version 0.90.7 and JVM 1.6 update 25. We're planning an upgrade to Elasticsearch 1.6, so stay tuned for more updates!

So, what’s next? We anticipate that with our current setup and development trajectory, utilising Elasticsearch, we are positioned to evolve the way B2B, or even B2C, sales processes are shaped. Similar to an IoT future where only relevant ads and personal offers are displayed, we can present our clients with ready-to-be-picked fruit for sales deals, whereby customers have directly or indirectly indicated they want or need a product or service. Combining all external online (big) data with internal client information and user-behaviour is the key to a perfect sales deal.