Transforming the eCommerce sample dataedit

Warning

This functionality is in beta and is subject to change. The design and code is less mature than official GA features and is being provided as-is with no warranties. Beta features are not subject to the support SLA of official GA features.

Data frame transforms enable you to retrieve information from an Elasticsearch index, transform it, and store it in another index. Let’s use the Kibana sample data to demonstrate how you can pivot and summarize your data with data frame transforms.

  1. If the Elasticsearch security features are enabled, obtain a user ID with sufficient privileges to complete these steps.

    You need manage_data_frame_transforms cluster privileges to preview and create data frame transforms. Members of the built-in data_frame_transforms_admin role have these privileges.

    You also need read and view_index_metadata index privileges on the source index and read, create_index, and index privileges on the destination index.

    For more information, see Security privileges and Built-in roles.

  2. Choose your source index.

    In this example, we’ll use the eCommerce orders sample data. If you’re not already familiar with the kibana_sample_data_ecommerce index, use the Revenue dashboard in Kibana to explore the data. Consider what insights you might want to derive from this eCommerce data.

  3. Play with various options for grouping and aggregating the data.

    For example, you might want to group the data by product ID and calculate the total number of sales for each product and its average price. Alternatively, you might want to look at the behavior of individual customers and calculate how much each customer spent in total and how many different categories of products they purchased. Or you might want to take the currencies or geographies into consideration. What are the most interesting ways you can transform and interpret this data?

    Pivoting your data involves using at least one field to group it and applying at least one aggregation. You can preview what the transformed data will look like, so go ahead and play with it!

    For example, go to Machine Learning > Data Frames in Kibana and use the wizard to create a data frame transform:

    Creating a simple data frame transform in Kibana

    In this case, we grouped the data by customer ID and calculated the sum of products each customer purchased.

    Let’s add some more aggregations to learn more about our customers' orders. For example, let’s calculate the total sum of their purchases, the maximum number of products that they purchased in a single order, and their total number of orders. We’ll accomplish this by using the sum aggregation on the taxless_total_price field, the max aggregation on the total_quantity field, and the cardinality aggregation on the order_id field:

    Adding multiple aggregations to a data frame transform in Kibana
    Tip

    If you’re interested in a subset of the data, you can optionally include a query element. In this example, we’ve filtered the data so that we’re only looking at orders with a currency of EUR. Alternatively, we could group the data by that field too. If you want to use more complex queries, you can create your data frame from a saved search.

    If you prefer, you can use the preview data frame transforms API:

    POST _data_frame/transforms/_preview
    {
      "source": {
        "index": "kibana_sample_data_ecommerce",
        "query": {
          "bool": {
            "filter": {
              "term": {"currency": "EUR"}
            }
          }
        }
      },
      "pivot": {
        "group_by": {
          "customer_id": {
            "terms": {
              "field": "customer_id"
            }
          }
        },
        "aggregations": {
          "total_quantity.sum": {
            "sum": {
              "field": "total_quantity"
            }
          },
          "taxless_total_price.sum": {
            "sum": {
              "field": "taxless_total_price"
            }
          },
          "total_quantity.max": {
            "max": {
              "field": "total_quantity"
            }
          },
          "order_id.cardinality": {
            "cardinality": {
              "field": "order_id"
            }
          }
        }
      }
    }
  4. When you are satisfied with what you see in the preview, create the data frame transform.

    1. Supply a job ID and the name of the target (or destination) index.
    2. Decide whether you want the data frame transform to run once or continuously.

    Since this sample data index is unchanging, let’s use the default behavior and just run the data frame transform once.

    Specifying the data frame transform options in Kibana

    If you want to try it out, however, go ahead and click on Continuous mode. You must choose a field that the data frame transform can use to check which entities have changed. In general, it’s a good idea to use the ingest timestamp field. In this example, however, you can use the order_date field.

    If you prefer, you can use the create data frame transforms API. For example:

    PUT _data_frame/transforms/ecommerce-customer-transform
    {
      "source": {
        "index": [
          "kibana_sample_data_ecommerce"
        ],
        "query": {
          "bool": {
            "filter": {
              "term": {
                "currency": "EUR"
              }
            }
          }
        }
      },
      "pivot": {
        "group_by": {
          "customer_id": {
            "terms": {
              "field": "customer_id"
            }
          }
        },
        "aggregations": {
          "total_quantity.sum": {
            "sum": {
              "field": "total_quantity"
            }
          },
          "taxless_total_price.sum": {
            "sum": {
              "field": "taxless_total_price"
            }
          },
          "total_quantity.max": {
            "max": {
              "field": "total_quantity"
            }
          },
          "order_id.cardinality": {
            "cardinality": {
              "field": "order_id"
            }
          }
        }
      },
      "dest": {
        "index": "ecommerce-customers"
      }
    }
  5. Start the data frame transform.

    Tip

    Even though resource utilization is automatically adjusted based on the cluster load, a data frame transform increases search and indexing load on your cluster while it runs. If you’re experiencing an excessive load, however, you can stop it.

    You can start, stop, and manage data frame transforms in Kibana:

    Managing data frame transforms in Kibana

    Alternatively, you can use the start data frame transforms and stop data frame transforms APIs. For example:

    POST _data_frame/transforms/ecommerce-customer-transform/_start
  6. Explore the data in your new index.

    For example, use the Discover application in Kibana:

    Exploring the new index in Kibana
Tip

If you do not want to keep the data frame transform, you can delete it in Kibana or use the delete data frame transform API. When you delete a data frame transform, its destination index and Kibana index patterns remain.