Large language model performance matrix

edit

This table describes the performance of various large language models (LLMs) for different use cases in Elastic Security, based on our internal testing. To learn more about these use cases, refer to Attack discovery or AI Assistant.

Feature Model

Claude 3: Opus

Claude 3.5: Sonnet v2

Claude 3.5: Sonnet

Claude 3.5: Haiku

Claude 3: Haiku

GPT-4o

GPT-4o-mini

Gemini 1.5 Pro 002

Gemini 1.5 Flash 002

Assistant - General

Excellent

Excellent

Excellent

Excellent

Excellent

Excellent

Excellent

Excellent

Excellent

Assistant - ES|QL generation

Excellent

Excellent

Excellent

Excellent

Excellent

Excellent

Great

Excellent

Poor

Assistant - Alert questions

Excellent

Excellent

Excellent

Excellent

Excellent

Excellent

Great

Excellent

Good

Assistant - Knowledge retrieval

Good

Excellent

Excellent

Excellent

Excellent

Excellent

Great

Excellent

Excellent

Attack Discovery

Great

Great

Excellent

Poor

Poor

Great

Poor

Excellent

Poor

Excellent is the best rating, followed by Great, then by Good, and finally by Poor.