Loading

OpenAI

Version 2.1.0 (View all)
Subscription level
What's this?
Basic
Developed by
What's this?
Elastic
Ingestion method(s) API
Minimum Kibana version(s) 9.0.0
8.18.0

The OpenAI integration allows you to monitor OpenAI API usage metrics and collect organization audit logs. OpenAI is an AI research and deployment company that offers API platform for their industry-leading foundation models.

With the OpenAI integration, you can track API usage metrics across their models, as well as for vector store and code interpreter. You can also collect audit logs from the OpenAI platform to monitor user actions, API key lifecycle events, and organization configuration changes. You will use Kibana to visualize your data, create alerts if usage limits are approaching, view metrics when you troubleshoot issues, and analyze audit events for security and compliance. For example, you can track token usage and API calls per model, as well as login attempts, API key creation/deletion, and role assignments.

The OpenAI integration leverages two OpenAI APIs for data collection:

  • Usage API: The OpenAI Usage API delivers comprehensive insights into your API activity, helping you understand and optimize your organization's OpenAI API usage.

  • Audit Logs API: The OpenAI Audit Logs API collects organization audit logs, providing visibility into user actions, API key lifecycle events, login attempts, role assignments, and other platform activity for security oversight and compliance.

The OpenAI integration collects the following data streams:

  • audit: Collects organization audit logs.
  • audio_speeches: Collects audio speeches usage metrics.
  • audio_transcriptions: Collects audio transcriptions usage metrics.
  • code_interpreter_sessions: Collects code interpreter sessions usage metrics.
  • completions: Collects completions usage metrics.
  • embeddings: Collects embeddings usage metrics.
  • images: Collects images usage metrics.
  • moderations: Collects moderations usage metrics.
  • vector_stores: Collects vector stores usage metrics.
Note

Users can view OpenAI metrics in the logs-* index pattern using Kibana Discover.

You need Elasticsearch for storing and searching your data and Kibana for visualizing and managing it.

You need an OpenAI account with a valid Admin key for programmatic access to the OpenAI Usage API and OpenAI Audit Logs API. To fetch audit logs, you must enable audit logging on the OpenAI platform in your organization settings under Data controls > Data retention. Audit logs also require Organization Owner permissions.

For step-by-step instructions on how to set up an integration, see the Getting started guide.

To generate an Admin key, please generate a key or use an existing one from the Admin keys page. Use the Admin key to configure the OpenAI integration.

Among the configuration options for the OpenAI integration, the following settings are particularly relevant: "Initial interval" and "Bucket width" for usage metrics, and "Initial interval" and "Interval" for audit logs.

  • Controls the historical data collection window at startup
  • Default value: 24 hours (24h)
  • Purpose: Loads historical context when you first set up the integration

A "bucket" refers to a time interval where OpenAI usage data is grouped together for reporting purposes. For example, with a 1-minute bucket width, usage metrics are aggregated minute by minute. With a 1-hour bucket width, all activity during that hour is consolidated into a single bucket. The bucket width determines your data's granularity and level of detail in your usage reporting.

  • Controls the time-based aggregation of metrics
  • Default: 1m (1 minute)
  • Options: 1m (1 minute), 1h (1 hour), 1d (1 day)
  • Affects API request frequency and data resolution
  • 1m buckets provide the highest resolution metrics, with data arriving in near real-time (1-minute delay)
  • 1h buckets aggregate hourly, with data arriving less frequently (1-hour delay)
  • 1d buckets aggregate daily, with data arriving once per day (24-hour delay)

Data granularity relationship: 1m > 1h > 1d

Bucket width choice affects storage usage (in Elasticsearch) and data resolution:

  • 1m: Maximum granularity, higher storage needs, ideal for detailed analysis.
  • 1h: Medium granularity, moderate storage needs, good for hourly patterns.
  • 1d: Minimum granularity, lowest storage needs, suitable for long-term analysis.

Example: For 100 API calls to a particular model per hour:

  • 1m buckets: Up to 100 documents
  • 1h buckets: 1 aggregated document
  • 1d buckets: 1 daily document

"Bucket width" and "Initial interval" directly affect API request frequency. When using a 1-minute bucket width, it's strongly recommended to set the "Initial interval" to a shorter duration, optimally 1-day, to ensure smooth performance. While our extensive testing demonstrates excellent results with a 6-month initial interval paired with a 1-day bucket width, the same level of success isn't achievable with 1-minute or 1-hour bucket widths. This is because the OpenAI Usage API returns different bucket quantities based on width (60 buckets per call for 1-minute, 24 for 1-hour, and 7 for 1-day widths). To achieve the best results when gathering historical data over long periods, using 1-day bucket width is the most effective method, ensuring a balance between data granularity and API limitations.

For optimal results with historical data, use 1-day bucket widths for long periods (15+ days), 1-hour for medium periods (1-15 days), and 1-minute only for the most recent 24 hours of data.

With default settings (Interval: 5m, Bucket width: 1m, Initial interval: 24h), the OpenAI integration follows this collection pattern:

  1. Starts collection from (current_time - initial_interval)
  2. Collects data up to (current_time - bucket_width)
  3. Excludes incomplete current bucket for data accuracy and wait for bucket completion
  4. Runs every 5 minutes by default (configurable)
  5. From second collection, start from end of previous bucket timestamp and collect up to (current_time - bucket_width)

With default settings (Interval: 5m, Bucket width: 1m, Initial interval: 24h):

The integration starts at 10:00 AM, collects data from 10:00 AM the previous day, and continues until 9:59 AM the current day. The next collection starts at 10:05 AM, collecting from the 10:00 AM bucket to the 10:04 AM bucket, as the "Interval" is 5 minutes.

ECS Field Reference

Refer to this document for detailed information on ECS fields.

The audit data stream captures organization audit logs.

The audio_speeches data stream captures audio speeches usage metrics.

The audio_transcriptions data stream captures audio transcriptions usage metrics.

The code_interpreter_sessions data stream captures code interpreter sessions usage metrics.

The completions data stream captures completions usage metrics.

The embeddings data stream captures embeddings usage metrics.

The images data stream captures images usage metrics.

The moderations data stream captures moderations usage metrics.

The vector_stores data stream captures vector stores usage metrics.

This integration includes one or more Kibana dashboards that visualizes the data collected by the integration. The screenshots below illustrate how the ingested data is displayed.