Finding gophers with vector search in Elasticsearch and Go

Building software in any programming language, including Go, is committing to a lifetime of learning. Through her university and working career, Carly has dabbled in many programming languages and technologies, including the latest and greatest implementations of vector search. But that wasn't enough! So recently Carly started playing with Go, too.

Just like animals, programming languages, and your friendly author, search has undergone an evolution of different practices that can be difficult to decide between for your own search use case. In this blog, we'll share an overview of vector search along with examples of each approach using Elasticsearch and the Elasticsearch Go client.

Prerequisites

To follow with this example, ensure the following prerequisites are met:

Installation of Go version 1.13 or later
Creation of your own Go repo with the
Creation of your own Elasticsearch cluster, populated with a set of rodent-based pages, including for our friendly Gopher, from Wikipedia:

Connecting to Elasticsearch

In our examples, we shall make use of the Typed API offered by the Go client. Establishing a secure connection for any query requires configuring the client using either:

Cloud ID and API key if making use of Elastic Cloud.
Cluster URL, username, password and the certificate.

Connecting to our cluster located on Elastic Cloud would look like this:

func GetElasticsearchClient() (*elasticsearch.TypedClient, error) {
	var cloudID = os.Getenv("ELASTIC_CLOUD_ID")
	var apiKey = os.Getenv("ELASTIC_API_KEY")

	var es, err = elasticsearch.NewTypedClient(elasticsearch.Config{
		CloudID: cloudID,
		APIKey:  apiKey,
		Logger:  &elastictransport.ColorLogger{os.Stdout, true, true},
	})

	if err != nil {
		return nil, fmt.Errorf("unable to connect: %w", err)
	}

	return es, nil
}

The client connection can then be used for vector search, as shown in subsequent sections.

Vector search

Vector search attempts to solve this problem by converting the search problem into a mathematical comparison using vectors. The document embedding process has an additional stage of converting the document using a model into a dense vector representation, or simply a stream of numbers. The advantage of this approach is that you can search non-text documents such as images and audio by translating them into a vector alongside a query.

In simple terms, vector search is a set of vector distance calculations. In the below illustration, the vector representation of our query Go Gopheris compared against the documents in the vector space, and the closest results (denoted by constant k) are returned:

Depending on the approach used to generate the embeddings for your documents, there are two different ways to find out what gophers eat.

Approach 1: Bring your own model

With a Platinum license, it's possible to generate the embeddings within Elasticsearch by uploading the model and using the inference API. There are six steps involved in setting up the model:

Select a PyTorch model to upload from a model repository. For this example, we're using the sentence-transformers/msmarco-MiniLM-L-12-v3 from Hugging Face to generate the embeddings.
Load the model into Elastic using the Eland Machine Learning client for Python using the credentials for our Elasticsearch cluster and task type text_embeddings:

eland_import_hub_model
--cloud-id $ELASTIC_CLOUD_ID \
--es-api-key $ELASTIC_API_KEY \
--hub-model-id sentence-transformers/msmarco-MiniLM-L-12-v3 \
--task-type text_embedding \ 
--start

Once uploaded, quickly test the model sentence-transformers__msmarco-minilm-l-12-v3 with a sample document to ensure the embeddings are generated as expected:

Create an ingest pipeline containing an inference processor. This will allow the vector representation to be generated using the uploaded model:

PUT _ingest/pipeline/search-rodents-vector-embedding-pipeline
{
  "processors": [
    {
      "inference": {
        "model_id": "sentence-transformers__msmarco-minilm-l-12-v3",
        "target_field": "text_embedding",
        "field_map": {
          "body_content": "text_field"
        }
      }
    }
  ]
}

Create a new index containing the field text_embedding.predicted_value of type dense_vector to store the vector embeddings generated for each document:

PUT vector-search-rodents
{
  "mappings": {
    "properties": {
      "text_embedding.predicted_value": {
        "type": "dense_vector",
        "dims": 384,
        "index": true,
        "similarity": "cosine"
      },
      "text": {
        "type": "text"
      }
    }
  }
}

Reindex the documents using the newly created ingest pipeline to generate the text embeddings as the additional field text_embedding.predicted_value on each document:

POST _reindex
{
  "source": {
    "index": "search-rodents"
  },
  "dest": {
    "index": "vector-search-rodents",
    "pipeline": "search-rodents-vector-embedding-pipeline"
  }
}

Now we can use the Knn option on the same search API using the new index vector-search-rodents, as shown in the below example:

func VectorSearch(client *elasticsearch.TypedClient, term string) ([]Rodent, error) {
	res, err := client.Search().
		Index("vector-search-rodents").
		Knn(types.KnnQuery{
      # Field in document containing vector
			Field:         "text_embedding.predicted_value",
      # Number of neighbors to return
			K:             10,
      # Number of candidates to evaluate in comparison
			NumCandidates: 10,
      # Generate query vector using the same model used in the inference processor
			QueryVectorBuilder: &types.QueryVectorBuilder{
				TextEmbedding: &types.TextEmbedding{
					ModelId:   "sentence-transformers__msmarco-minilm-l-12-v3",
					ModelText: term,
				},
			}}).Do(context.Background())

	if err != nil {
		return nil, fmt.Errorf("error in rodents vector search: %w", err)
	}

	return getRodents(res.Hits.Hits)
}

Converting the JSON result object via unmarshalling is done in the exact same way as the keyword search example. Constants K and NumCandidates allow us to configure the number of neighbor documents to return and the number of candidates to consider per shard. Note that increasing the number of candidates increases the accuracy of results but leads to a longer-running query as more comparisons are performed.

When the code is executed using the query What do Gophers eat?, the results returned look similar to the below, highlighting that the Gopher article contains the information requested unlike the prior keyword search:

[
  {ID:64f74ecd4acb3df024d91112 Title:Gopher - Wikipedia Url:https://en.wikipedia.org/wiki/Gopher} 
  {ID:64f74ed34acb3d71aed91fcd Title:Squirrel - Wikipedia Url:https://en.wikipedia.org/wiki/Squirrel} 
  //Other results omitted
]

Approach 2: Hugging Face inference API

Another option is to generate these same embeddings outside of Elasticsearch and ingest them as part of your document. As this option does not make use of an Elasticsearch machine learning node, it can be done on the free tier.

Hugging Face exposes a free-to-use, rate-limited inference API that, with an account and API token, can be used to generate the same embeddings manually for experimentation and prototyping to help you get started. It is not recommended for production use. Invoking your own models locally to generate embeddings or using the paid API can also be done using a similar approach.

In the below function GetTextEmbeddingForQuery we use the inference API against our query string to generate the vector returned from a POST request to the endpoint:

// HuggingFace text embedding helper
func GetTextEmbeddingForQuery(term string) []float32 {
    // HTTP endpoint
    model := "sentence-transformers/msmarco-minilm-l-12-v3"
    posturl := fmt.Sprintf("https://api-inference.huggingface.co/pipeline/feature-extraction/%s", model)

    // JSON body
    body := []byte(fmt.Sprintf(`{
        "inputs": "%s",
        "options": {"wait_for_model":True}
    }`, term))

    // Create a HTTP post request
    r, err := http.NewRequest("POST", posturl, bytes.NewBuffer(body))

    if err != nil {
        log.Fatal(err)
        return nil
    }

    token := os.Getenv("HUGGING_FACE_TOKEN")
    r.Header.Add("Authorization", fmt.Sprintf("Bearer %s", token))

    client := &http.Client{}
    res, err := client.Do(r)
    if err != nil {
        panic(err)
    }

    defer res.Body.Close()

    var post []float32
    derr := json.NewDecoder(res.Body).Decode(&post)

    if derr != nil {
        log.Fatal(derr)
        return nil
    }

    return post
}

The resulting vector, of type []float32 is then passed as a QueryVector instead of using the QueryVectorBuilder option to leverage the model previously uploaded to Elastic.

func VectorSearchWithGeneratedQueryVector(client *elasticsearch.TypedClient, term string) ([]Rodent, error) {
	vector, err := GetTextEmbeddingForQuery(term)
	if err != nil {
		return nil, err
	}

	if vector == nil {
		return nil, fmt.Errorf("unable to generate vector: %w", err)
	}

	res, err := client.Search().
		Index("vector-search-rodents").
		Knn(types.KnnQuery{
      # Field in document containing vector
			Field:         "text_embedding.predicted_value",
      # Number of neighbors to return
			K:             10,
      # Number of candidates to evaluate in comparison
			NumCandidates: 10,
      # Query vector returned from Hugging Face inference API
			QueryVector:   vector,
		}).
		Do(context.Background())

	if err != nil {
		return nil, err
	}

	return getRodents(res.Hits.Hits)
}

Note that the K and NumCandidates options remain the same irrespective of the two options and that the same results are generated as we are using the same model to generate the embeddings

Conclusion

Here we've discussed how to perform vector search in Elasticsearch using the Elasticsearch Go client. Check out the GitHub repo for all the code in this series. Follow on to part 3 to gain an overview of combining vector search with the keyword search capabilities covered in part one in Go.

Until then, happy gopher hunting!

Resources

Ready to build RAG into your apps? Want to try different LLMs with a vector database?
Check out our sample notebooks for LangChain, Cohere and more on Github, and join the Elasticsearch Engineer training starting soon!