Finding air carriers with the most delaysedit

In this example, we use the Flights sample dataset to find out which air carrier had the most delays. First, we filter the source data such that it excludes all the cancelled flights by using a query filter. Then we transform the data to contain the distinct number of flights, the sum of delayed minutes, and the sum of the flight minutes by air carrier. Finally, we use a bucket_script to determine what percentage of the flight time was actually delay.

POST _data_frame/transforms/_preview
{
  "source": {
    "index": "kibana_sample_data_flights",
    "query": { 
      "bool": {
        "filter": [
          { "term":  { "Cancelled": false } }
        ]
      }
    }
  },
  "dest" : { 
    "index" : "sample_flight_delays_by_carrier"
  },
  "pivot": {
    "group_by": { 
      "carrier": { "terms": { "field": "Carrier" }}
    },
    "aggregations": {
      "flights_count": { "value_count": { "field": "FlightNum" }},
      "delay_mins_total": { "sum": { "field": "FlightDelayMin" }},
      "flight_mins_total": { "sum": { "field": "FlightTimeMin" }},
      "delay_time_percentage": { 
        "bucket_script": {
          "buckets_path": {
            "delay_time": "delay_mins_total.value",
            "flight_time": "flight_mins_total.value"
          },
          "script": "(params.delay_time / params.flight_time) * 100"
        }
      }
    }
  }
}

Filter the source data to select only flights that were not cancelled.

This is the destination index for the data frame. It is ignored by _preview.

The data is grouped by the Carrier field which contains the airline name.

This bucket_script performs calculations on the results that are returned by the aggregation. In this particular example, it calculates what percentage of travel time was taken up by delays.

The preview shows you that the new index would contain data like this for each carrier:

{
  "preview" : [
    {
      "carrier" : "ES-Air",
      "flights_count" : 2802.0,
      "flight_mins_total" : 1436927.5130677223,
      "delay_time_percentage" : 9.335543983955839,
      "delay_mins_total" : 134145.0
    },
    ...
  ]
}

This data frame makes it easier to answer questions such as:

  • Which air carrier has the most delays as a percentage of flight time?
Note

This data is fictional and does not reflect actual delays or flight stats for any of the featured destination or origin airports.