Finding your best customersedit

In this example, we use the eCommerce orders sample dataset to find the customers who spent the most in our hypothetical webshop. Let’s transform the data such that the destination index contains the number of orders, the total price of the orders, the amount of unique products and the average price per order, and the total amount of ordered products for each customer.

POST _data_frame/transforms/_preview
{
  "source": {
    "index": "kibana_sample_data_ecommerce"
  },
  "dest" : { 
    "index" : "sample_ecommerce_orders_by_customer"
  },
  "pivot": {
    "group_by": { 
      "user": { "terms": { "field": "user" }},
      "customer_id": { "terms": { "field": "customer_id" }}
    },
    "aggregations": {
      "order_count": { "value_count": { "field": "order_id" }},
      "total_order_amt": { "sum": { "field": "taxful_total_price" }},
      "avg_amt_per_order": { "avg": { "field": "taxful_total_price" }},
      "avg_unique_products_per_order": { "avg": { "field": "total_unique_products" }},
      "total_unique_products": { "cardinality": { "field": "products.product_id" }}
    }
  }
}

This is the destination index for the data frame. It is ignored by _preview.

Two group_by fields have been selected. This means the data frame will contain a unique row per user and customer_id combination. Within this dataset both these fields are unique. By including both in the data frame it gives more context to the final results.

Note

In the example above, condensed JSON formatting has been used for easier readability of the pivot object.

The preview data frame transforms API enables you to see the layout of the data frame in advance, populated with some sample values. For example:

{
  "preview" : [
    {
      "total_order_amt" : 3946.9765625,
      "order_count" : 59.0,
      "total_unique_products" : 116.0,
      "avg_unique_products_per_order" : 2.0,
      "customer_id" : "10",
      "user" : "recip",
      "avg_amt_per_order" : 66.89790783898304
    },
    ...
    ]
  }

This data frame makes it easier to answer questions such as:

  • Which customers spend the most?
  • Which customers spend the most per order?
  • Which customers order most often?
  • Which customers ordered the least number of different products?

It’s possible to answer these questions using aggregations alone, however data frames allow us to persist this data as a customer centric index. This enables us to analyze data at scale and gives more flexibility to explore and navigate data from a customer centric perspective. In some cases, it can even make creating visualizations much simpler.