15 September 2015 Engineering

Analyzing Salesforce Data with Logstash, Elasticsearch, and Kibana

By Russ Savage

In the spirit of Dreamforce and being part of Elastic, I’m excited to introduce a new open source plugin for Logstash that makes it easy to pull vast amounts of data out of Salesforce and into Elasticsearch.

This plugin allows you to query almost any object in Salesforce and pull it into the Logstash pipeline.  From there, you can output it to your analytics tool of choice (ours is Kibana) to discover insights that you may have missed otherwise. We use it internally to keep an eye on our our sales pipeline and customer health over time. 

Working directly with sales and marketing operations, we outlined a number of challenges they had that might be solved with this solution. Those included:

  • Interactive time-series snapshot analysis across a number of dimensions. By sales rep, by region, by campaign and more.
  • Which sales reps moved the most pipeline the day before the end of month/quarter? What was the progression of Stage 1 opportunities over time.
  • Correlating data outside of Salesforce (like web traffic) to pipeline building and demand. By region/country/state/city and associated pipeline.

It’s very challenging to look back in time and see trends in the data. Many companies have configured Salesforce to save reporting snapshots, but if you're like me, you want to see the data behind the aggregate report. I want the ability to drill down to any level of detail, for any timeframe, and find any metric. We found that Salesforce snapshots just aren't flexible enough for that.

Logstash already allows users to pull in vast amounts of data from a variety of sources, and with this plugin, we are adding Salesforce as a source. Once the data is in our Elasticsearch cluster, we use Kibana to quickly filter and display the data in ways we couldn't before using Salesforce’s built in reporting.

Now, let’s focus on how you can set up the Logstash Input for Salesforce for your own organization.

Querying Salesforce from Logstash

As I mentioned earlier, we’ve been using the Logstash Salesforce Input plugin internally to handle all of this data. We decided others could benefit from this plugin as well so we open sourced it and made it available for anyone to install using the standard Logstash plugin framework.

With a fresh install of the latest Logstash, you should be able to install this plugin with:

bin/plugin install logstash-input-salesforce

Based on your configuration, the plugin will query sObjects from Salesforce and create an event for each row of data returned. You can then filter and transform the data as needed before pushing it to wherever it needs to go.

To configure this plugin, we will need to gather some credentials from Salesforce. This involves creating a new Connected App with OAuth in Salesforce and generating a user token. Depending on your Salesforce permissions, you might need to ask your Administrator to do this for you. You should only need to do this once.

Here is an example of the minimum information you need to enter. Of course, the scope can be configured as needed but just make sure to give enough access for the sObjects you want to query. Also, for the callback URL, you can just enter localhost since the plugin will be using password authentication.

Basic Configuration for Connected OAuth App in Salesforce

Once you have a new application set up, you will need the Consumer Key and Consumer Secret values, so keep track of those.

Basic Salesforce Connected App Showing OAuth Credentials

While you’re in Salesforce, you should also go ahead and generate a token for the username you will be using for authentication. Here are some detailed instructions on how to generate or reset your token if you don’t already have one.

Now we have everything we need for connecting to Salesforce from Logstash.  Let’s start with a simple Logstash config that will connect to your sandbox Salesforce instance (test.salesforce.com) and pull in all the Opportunity objects.

input {  
  salesforce {    
    use_test_sandbox => true #you are testing in a sandbox right?
    client_id => 'CONSUMER_KEY_FROM_SALESFORCE'    
    client_secret => 'CONSUMER_SECRET_FROM_SALESFORCE'    
    username => 'you@example.com.sandbox'    
    password => 's3crEt!'    
    security_token => 'SECURITY_TOKEN_FROM_SALESFORCE'    
    sfdc_object_name => 'Opportunity'  
output { 
  stdout { codec => rubydebug } 

Here, we are telling the plugin to use the test sandbox, connect with our credentials, and pull in all the Opportunity objects. By default, the input will pull in all the fields for that sObject. If you would only like to see selected fields, you can specify them using the sfdc_fields option:

sfdc_fields => ['Id','Name','Type','CloseDate']

I like to pull everything in because it gives me the most flexibility later. By default, the input will use the standard Salesforce API Names as the document key fields. If you’d like to normalize them from camel case (e.g. LastModifiedBy) to something a little more Logstash like (e.g. last_modified_by), you can set the to_underscores option to true.

For a full list of input options and what they do, check out the Logstash Salesforce Input documentation. If you want additional sObjects loaded into Logstash, for now, you will need to duplicate that config for each object. We are currently pulling in Leads, Contacts, Accounts, Opportunities and more using this process.

Also, by default, the plugin uses version 26 of the Salesforce API. If you’re using a newer object, you can specify a different api version using the api_version config option.

Pushing the Data to Elasticsearch

Now that we have the data flowing into Logstash, we can adjust the output to write data wherever we need. For this example, we are going to push the data directly into our cluster using the Elasticsearch output:

output {  
  elasticsearch {    
    index => "salesforce"    
    index_type => "opportunity"    
    host => "localhost"  

That’s the simplest way to push data into an Elasticsearch instance running on your laptop. Of course, if you’d like to not worry about managing your own cluster, you can push the data directly to a Found (hosted Elasticsearch) cluster as well. The advantage there is that in just a few clicks, you can have an Elasticsearch cluster up and running with Shield for security and Kibana for visualizations. We use it internally as much as possible.

This is great for a single download of the data. If you want to keep gathering this information on a regular interval, it’s trivial to setup a cron job to do just that. We have ours collect this data every 24 hours, but you can adjust yours as needed.

0 5 * * * /opt/logstash/bin/logstash -f /opt/logstash/sfdc-snapshot.conf -w 4

Connecting the Data

When you start pulling this data in, you will notice that Salesforce has a lot of Id and Lookup fields. Looking at an 18 character OwnerId really doesn’t help you much.  Once you have a process to load the raw data, you can start to enrich it by connecting the sObjects together.

You might create a separate Logstash configuration for doing just that. As an input, it will query new data from the Salesforce index, use the Elasticsearch filter to look up related documents from an id, attach that document to the original, and finally write that information to the existing index or a separate one. A sample configuration might look like this:

input {
  elasticsearch {
    hosts => ['localhost']
    index => 'salesforce'
    query => '{ "query": { "query_string": { "query":"_type:\"opportunity\" AND @timestamp:[now-12h TO now]" } } }' #everything from the past 12 hours.
filter {
  if [OwnerId] {
    elasticsearch {
      hosts => [ 'localhost:9200' ]
      query => 'id:"%{[OwnerId]}" AND _type:user AND @timestamp:[now-12h TO now]'
      fields => ["Name","sfdc_owner_name"]
output {
  elasticsearch {
    index => "salesforce-enriched"
    index_type => "opportunity"
    host => "localhost"

Of course you can add as many lookups as you want to connect Accounts, Opportunities, Contacts or whatever custom sObjects you might have as well.

Visualizing the Data with Kibana

Once the data is in Elasticsearch, the easiest way to start visualizing trends is to use Kibana. There are lots of fun things to look at but you can start by creating a vertical bar chart histogram and slicing the columns by a particular Salesforce field you are interested in. In the chart below we can see pipeline changes on a daily basis by stage. We noticed a drop in stage 2 pipeline on the 9th and because we have the data snapshots saved, we can easily see exactly which opportunities we changed at that time. This is just some sample data, but hopefully it gives you some ideas about how you can start slicing and dicing your pipeline.

Pipeline By Stage Over Time


Liberating your data from Salesforce and putting it into Elasticsearch will allow you to dive much deeper and see trends that you might have otherwise missed. Our new Logstash input plugin is a quick and easy way to pull your data out of Salesforce and start tinkering with it in Kibana. Please let us know what improvements you would like to see by filing issues in the Github repo and have a great time at Dreamforce!