이 페이지의 콘텐츠는 선택하신 언어로 제공되지 않습니다. Elastic은 다양한 언어로 콘텐츠를 제공하기 위해 최선을 다하고 있습니다.조금만 더 기다려주세요!

5 criteria for choosing a vector database that will survive production

Image-how_to_choose_a_vector_database_(1).png

Most teams pick a vector database the same way by finding approximate nearest neighbor (ANN) benchmarks, sorting by recall@10, checking GitHub stars, running a proof of concept on 50,000 sample vectors, then shipping it. But six months later, they're standing up a separate Elasticsearch cluster because their chosen database has no BM25 index; or, they're blocked at security review because there's no document-level access control and the enterprise customer requires per-user data isolation; or, the sync job that feeds documents from Confluence and SharePoint into the vector store breaks every time a document is updated, and two engineers are now maintaining that pipeline instead of building features. This blog covers the five criteria to evaluate before you commit, not after.

What makes a vector database different from a traditional database

A traditional database matches exact values. You query for user_id = 42 or status = 'active' and the database finds rows where the data matches precisely. Relational databases, key-value stores, and document databases all work this way. The index structures they use like B-trees and hash maps are built for exact lookups.

A vector database retrieves by semantic similarity. Instead of querying for an exact value, you provide a vector — a list of floating-point numbers representing the meaning of your query — and the database finds the records whose vectors are closest to yours in high-dimensional space. Those vectors come from embedding models: neural networks that compress the meaning of text, images, or other data into dense numerical representations. A sentence like "how do I reset my password" and "forgot credentials" will produce vectors that sit close together in that space, even though they share no words.

The index structure used for this is Hierarchical Navigable Small World (HNSW). It builds a multi-layer graph of approximate nearest neighbors, letting you trade a small amount of recall accuracy for dramatically faster query times. "Approximate" is the key word: HNSW trades exact results for speed, and the recall-throughput tradeoff is the first evaluation criterion in this blog.

Elasticsearch combines both approaches. You can run exact-match structured queries and vector similarity search in the same system against the same documents in the same query with results merged by reciprocal rank fusion (RRF).

Criterion 1: Recall and throughput measured on your data

Recall@k is the most important retrieval quality metric. It asks the question: Of the k-most relevant documents (computed by brute-force exact search), how many did the approximate search actually return? A recall@10 of 0.95 means the HNSW index found 9.5 of the 10 correct results on average. The gap between 0.95 and 0.99 sounds small. In a production retrieval augmented generation (RAG) pipeline, it means 5% of queries miss a document that should have been retrieved, which degrades answer quality in ways that are hard to debug.

Queries per second (QPS) is the throughput dimension. A system with recall@10 of 0.99 and 50 QPS isn't useful if your application needs 500 QPS. Both metrics matter, and they trade against each other: Raising the HNSW ef_search parameter improves recall and lowers throughput. The right operating point depends on your workload.

The ANN Benchmarks project publishes reproducible recall-vs-QPS curves across algorithms and datasets. It is the starting point for understanding the general landscape. ANN benchmarks use standard benchmark datasets, not your data. Distribution shift matters. A dataset of product catalog embeddings from a fine-tuned model will behave very differently than the GloVe or SIFT vectors in the benchmarks. Elasticsearch Labs publishes reproducible benchmark configurations covering HNSW settings and quantization levels, including int8 and binary quantization, which you can use as a baseline for your own tests.

Evaluation protocol:

  1. Sample 10,000–100,000 vectors from your actual production data (or your closest available proxy).

  2. Compute ground-truth nearest neighbors using brute-force exact search. This is your recall ceiling.

  3. Index those vectors into each candidate system using the default configuration.

  4. Measure recall@10 and QPS at your target p99 latency (e.g., under 100ms).

  5. Tune the HNSW parameters — num_candidates, ef_construction, m — and repeat.

  6. Check how index size and memory consumption scale as you increase to your full production volume.

The number that matters most is not peak recall or peak QPS in isolation. It is the recall you can achieve at your target QPS.

Criterion 2: Hybrid search capability

Pure vector search retrieves by semantic similarity. That works for intent-matching queries: "what are the side effects of this medication," "find documents similar to this one," "show me products like the one I just viewed." But real applications have requirements that vector search alone cannot satisfy.

Consider a legal document search system. A user searches for "termination clause" with a filter jurisdiction: 'California' and a date range filed_after: '2024-01-01'. The semantic component finds documents about termination clauses. The filter restricts results to California filings from 2024. No amount of semantic similarity replaces that filter — it is an exact requirement.

The pattern appears constantly in production:

  • E-commerce: semantic product discovery + brand and price filters

  • Healthcare: clinical notes retrieval + patient cohort filters

  • Code search: semantic function search + language and repository filters

  • Customer support: semantic ticket similarity + priority and status filters

Reciprocal rank fusion (RRF) is the standard approach for merging vector and keyword results. It ranks each document by the inverse of its rank in each sub-search, then sums those scores. The formula is score = Σ 1/(k + rank_i) where k is a smoothing constant (typically 60). RRF consistently outperforms linear score combination because it is robust to score scale differences between BM25 and vector similarity.

The query flow in a hybrid search system:

flowchart LR
    Q[User query] --> V[Vector search HNSW]
    Q --> L[Lexical search BM25]
    Q --> F[Filters / structured queries]
    V --> R[RRF merger]
    L --> R
    F --> R
    R --> RE[Reranker - optional]
    RE --> OUT[Results]

When evaluating a vector database for hybrid search, ask yourself:

  • Does it support native BM25 without a separate search service?

  • Can you apply metadata filters alongside vector search in a single query, or does filtering require a pre- or post-processing step?

  • Does it support structured aggregations like faceted counts and date histograms in the same request?

  • Is hybrid search a first-class API feature or a workaround built on top of the vector API?

Elasticsearch handles all of these natively. Dedicated vector databases like Pinecone and Qdrant provide vector search with metadata filtering but no BM25 index. That means your application needs a second system for keyword search — typically, Elasticsearch or OpenSearch — or you must implement keyword matching at the application layer. The dual-stack approach works, but it means two systems to deploy, monitor, and keep in sync.

Criterion 3: Security and enterprise controls

Security requirements rarely surface during the prototype phase. They appear during procurement review, during a compliance audit, or when your first enterprise customer asks whether their documents are isolated from other tenants. By that point, migrating to a system with stronger security is expensive.

The table below shows the five security capabilities that determine whether a vector database is viable for enterprise deployments:

CapabilityWhy it mattersElasticsearchTypical dedicated vector DB
Field-level securityA nurse should see a patient name but not their SSN in the same document.Yes, enforced per role per indexNot available
Document-level securityA tenant's documents must not appear in another tenant's query results; this is enforced at query time.Yes, query-time enforcementNot available
Role-based access control (RBAC)Different users and services need different index and cluster permissions.Yes, built-in with Kibana integrationBasic API key only
Encryption at rest and in transitTLS for all traffic; encrypted storage for data at restYes, default on Elastic CloudVaries; but often, yes, on managed cloud plans
Audit loggingRegulated industries must produce a record of every document retrieval and admin action.Yes, structured and filterableLimited or not available
Air-gapped on-prem deploymentSome government and financial environments have no external network access.YesNot available for Pinecone (SaaS only); limited for others

For financial services, healthcare, and government deployments, document-level security and audit logging are requirements, not nice-to-haves. A hospital using a RAG system over clinical notes must ensure that a query from a nurse practitioner cannot retrieve records outside their patient panel. A financial institution must produce an audit trail of every document retrieval for regulatory review.

Pinecone, Qdrant, and Weaviate provide API key authentication and namespace isolation. Those are appropriate for single-tenant applications or internal tools. They are not equivalent to document-level security enforced at query time.

Milvus is self-hosted, which gives you more flexibility to add security layers at the infrastructure level. Those controls are not native to the query engine itself.

Criterion 4: Integrated platform vs. point solution

This is the one criterion where point solutions have a genuine advantage, and you should take it seriously.

Dedicated vector databases like Pinecone, Qdrant, Milvus, and Weaviate are simpler to get started with. The Pinecone quickstart is under 20 lines of Python. Qdrant's in-memory mode lets you develop locally without any infrastructure. Milvus has a strong community and extensive documentation for large-scale deployments. If your workload is purely vector — semantic similarity search with no keyword requirements, no complex per-user access controls, and no multitenant data isolation — a dedicated vector database is a reasonable choice with lower initial complexity.

The cost appears at scale and when requirements broaden.

When your application adds a keyword search requirement, you need a second system. When you need multitenant isolation, you have to build it at the application layer. When the data pipeline needs to sync from 12 different sources, such as SharePoint, Jira, Confluence, S3, and MongoDB, you have to build and maintain that pipeline for two systems. Two systems means two monitoring setups, two upgrade cycles, and two failure modes to debug when search quality degrades at 2 a.m.

The architecture comparison:

RequirementPoint solutionIntegrated platform
Vector search onlySimpler to start, well-optimizedFully supported
Vector + keyword hybridRequires dual-stackNative, single query
Multitenant data isolationApplication-layer workaroundDocument-level security
Production AI at scaleWorks for pure-vector workloadsWorks across mixed workloads
Air-gapped or on-premLimited (Milvus self-hosted; Pinecone SaaS-only)Full on-prem support

The honest framing: if you are building a single-tenant semantic search feature with no keyword requirements, Pinecone or Qdrant will get you live faster. If you are building a production AI application that handles multiple tenants, mixed query types, regulated data, or enterprise connectors, the integration cost of a point solution accumulates quickly.

Criterion 5: Ecosystem and data connectivity

A vector database is a stop on the data pipeline, not a destination. Documents live in Confluence, Jira, SharePoint, Google Drive, S3, Salesforce, MongoDB, or any combination of those. Before a vector database can answer a query, those documents need to be extracted, chunked, embedded, and indexed. Then they need to stay current as source documents are updated or deleted.

This pipeline is where most production AI applications actually break. The search quality demo works. The sync job that keeps the index current is what causes 2 AM pages.

Elasticsearch ships with over 400 connectors covering common enterprise content sources: Confluence, Jira, SharePoint, Google Drive, S3, Salesforce, GitHub, MongoDB, PostgreSQL, MySQL, and many others. Those connectors handle incremental sync, deletion propagation, and permission inheritance.

The other component that cuts pipeline complexity is the semantic_text field type. Instead of writing embedding model selection code, chunking logic, and vector dimensionality configuration separately, you declare a field as semantic_text and Elasticsearch handles the embedding pipeline automatically, using Jina AI embeddings by default.

The mapping:

PUT /my-documents
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "body": { "type": "semantic_text" }
    }
  }
}

Indexing a document:

POST /my-documents/_doc
{
  "title": "Q4 Security Review: Findings and Remediation Plan",
  "body": "This report summarizes the findings from the Q4 penetration test conducted by an external firm. Three critical vulnerabilities were identified in the authentication service..."
}

Elasticsearch automatically chunks the body field, calls the embedding model, and stores the dense vectors alongside the source text. At query time, semantic search over this field requires no vector computation in your application:

GET /my-documents/_search
{
  "query": {
    "semantic": {
      "field": "body",
      "query": "what security issues were found last quarter"
    }
  }
}

What semantic_text eliminates: embedding model selection and management, chunking logic and chunk size tuning, vector dimensionality configuration, and the inference infrastructure to call an embedding model at index time. That is typically 200–400 lines of pipeline code you no longer need to write or maintain.

A practical evaluation checklist

Use this checklist before committing to a vector database for a production application.

Recall and throughput

  • Measured recall@10 and QPS at target p99 latency on a 10,000-vector sample of your actual production data

  • Verified index size and memory footprint at 10x your current data volume

  • Confirmed HNSW tuning parameters (m, ef_construction, num_candidates) are documented and reproducible

  • Tested quantization options (e.g., int8 and binary) and measured the recall penalty at your target compression ratio

Security

  • Confirmed authentication model: validated whether access is managed through API keys only or full RBAC with index-level and operation-level permissions

  • Tested document-level access control: verified that a query by User A cannot return documents scoped to User B

  • Reviewed field-level security: confirmed that sensitive fields are hidden by role rather than post-filtered at the application layer

  • Validated audit logging: confirmed that query and admin activity produces a structured, filterable record

  • Verified encryption requirements: confirmed encryption at rest and in transit, including current TLS version requirements

  • Assessed deployment options: confirmed whether air-gapped or on-prem deployment is available if required by compliance policy

Operational model

  • Mapped the full operational stack, including any secondary systems required for keyword search or sync

  • Estimated total infrastructure cost at production volume across all systems

  • Confirmed upgrade and schema migration path with a tested rollback procedure

  • Reviewed observability: query latency percentiles, recall degradation detection, and index lag monitoring

Ecosystem

  • Mapped data sources: identified every source that needs to feed the vector index

  • Evaluated connector strategy: confirmed available connectors or estimated engineering effort for custom sync

  • Tested index freshness: verified that incremental sync updates new and changed content in the index

  • Validated deletion propagation: confirmed that deleted source documents are removed from the index

  • Reviewed embedding pipeline ownership: confirmed whether inference is self-managed or platform-managed through semantic_text 

Frequently asked questions

What is a vector database, and how does it differ from a traditional database?
A traditional database retrieves by exact match — you query for specific field values and the database returns rows where the data matches. A vector database retrieves by semantic similarity: you provide a query vector and the database finds the records whose stored vectors are closest in high-dimensional space. Vector databases use approximate nearest neighbor (ANN) index structures like HNSW to do this at scale. Elasticsearch combines both: exact-match structured queries and vector similarity search in the same system, against the same documents, in the same query.

What metrics should I use to evaluate vector database performance?
The two primary metrics are recall@k and QPS (queries per second). Recall@k measures retrieval quality: of the k most relevant documents, what fraction did the approximate search return? QPS measures throughput. These trade against each other — higher recall typically requires more compute per query. For production AI applications, measure both at your target p99 latency on a sample of your own production vectors. ANN Benchmarks provides baseline recall-vs-QPS curves across standard datasets, but your embedding model and data distribution will produce different results.

When should I use a point solution vector database vs. an integrated platform?
Use a dedicated vector database (Pinecone, Qdrant, Milvus, Weaviate) if your workload is purely semantic similarity search with no keyword requirements, no complex per-user access controls, and no enterprise data source connectivity needs. Pinecone and Qdrant in particular have lower operational overhead for pure-vector use cases. Choose an integrated platform like Elasticsearch when you need hybrid search (vector + BM25), document-level access control for multitenant data isolation, enterprise connector integrations, or a single operational stack. Point solutions add integration cost as requirements broaden.

What security controls should I require from a vector database for enterprise or regulated use?
At minimum: RBAC with index-level and operation-level permissions, document-level security that enforces per-user or per-tenant data isolation at query time (not post-filtering in the application), field-level security for sensitive data fields, encryption at rest and in transit, and structured audit logging for compliance. Most dedicated vector databases provide API key authentication and namespace isolation — which is not equivalent to query-time document-level security. Elasticsearch provides all of these natively, including air-gapped on-prem deployment for the highest-security environments.

What is semantic_text in Elasticsearch, and what problem does it solve?
semantic_text is an Elasticsearch field type that manages the entire embedding pipeline automatically. When you declare a field as semantic_text, Elasticsearch handles document chunking, embedding model inference, vector storage, and query-time semantic retrieval without any application-layer pipeline code. This removes the need to select and manage an embedding model, write chunking logic, configure vector dimensionality, or run inference infrastructure. It uses Jina AI embeddings by default and is compatible with Elasticsearch's hybrid search and RRF ranking.

How does Elasticsearch's HNSW implementation compare to dedicated vector databases?
Elasticsearch uses HNSW for approximate nearest neighbor search — the same algorithm used by Pinecone, Qdrant, Milvus, and Weaviate. The tunable parameters are m (graph connectivity), ef_construction (index-time beam width), and num_candidates (query-time search width). Elasticsearch also supports int8 and binary quantization, which reduce storage and memory by 4–32x with a measurable but often acceptable recall penalty. Elasticsearch Labs publishes reproducible benchmark configurations so you can reproduce and extend their measurements on your own hardware and data. For pure-vector workloads, dedicated databases sometimes show higher QPS at equivalent recall — run your own benchmarks on your data to verify.

Which vector database is best for a production AI application?
It depends on your requirements. For pure semantic similarity search with minimal operational requirements and a single tenant, Pinecone and Qdrant are well-optimized options with lower setup friction. For large-scale self-hosted deployments, Milvus is mature and widely deployed. For production AI applications that require hybrid search, document-level security, data source connectivity across many enterprise systems, or a single operational stack, Elasticsearch covers all of those in one system — with over 400 connectors for data ingestion and semantic_text for automated embedding pipeline management.

How do I evaluate vector database security for a multitenant application?
Start by asking two questions: does the database enforce data isolation at query time (so tenant A's vectors never appear in tenant B's results), and can you prove it? API key authentication and namespace isolation are the most common security model in dedicated vector databases — they prevent accidental cross-tenant access but don't enforce document-level policies based on user identity or role. For regulated multitenant deployments, you need query-time document-level security where the access policy is evaluated as part of the search, not filtered afterward. Test it: run a query as a restricted user and verify the response contains no documents outside their scope.

Originally published on July 15, 2024.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

In this blog post, we may have used or referred to third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use. 

Elastic, Elasticsearch, and associated marks are trademarks, logos or registered trademarks of elasticsearch B.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.