Glossary

k-Nearest Neighbors (kNN)

A search method that finds the k most similar vectors to a query by comparing it against every item in the dataset using a chosen distance metric. kNN is exhaustive and exact, guaranteeing optimal results, but scales as O(n) per query, making it impractical for large datasets. In practice it is used for small datasets or as a ground truth baseline for measuring the recall of approximate nearest neighbor methods.

Knowledge Base

A collection of documents or data that a search or RAG system retrieves from at query time. It can contain structured or unstructured content such as company documents, product manuals, or research papers. In embedding-based systems, the knowledge base is chunked, embedded, and indexed into a vector store. Its quality directly determines the quality of retrieval and generation: incomplete or noisy content yields unreliable results regardless of model quality.

Knowledge Distillation

Training a smaller model (the student) to replicate the behavior of a larger, more capable model (the teacher). The student learns not just from raw data but from the teacher's outputs, which contain richer information about relationships between examples. The result is a compact model that performs closer to the large model than if trained on its own.

k-Nearest Neighbors (kNN)

Knowledge Base

Knowledge Distillation

최첨단 검색 환경을 구축할 준비가 되셨나요?