Go-ing gopher hunting with Elasticsearch and Go

Building software in any programming language, including Go, is committing to a lifetime of learning. Throughout her university and working career, Carly has needed to adapt to being a polyglot and dabble in many programming languages, including Python, C, JavaScript, TypeScript, and Java. But that wasn't enough! So recently she started playing with Go too!

Just like animals, programming languages, and one of your friendly authors, search has undergone an evolution of different practices that can be difficult to decide between for your own search use case. In this blog, we'll share an overview of traditional keyword search along with an example using Elasticsearch and the Elasticsearch Go client.

Prerequisites

To follow with this example, ensure the following prerequisites are met:

Installation of Go version 1.13 or later
Create your own Go repo using the recommended structure and package management covered in the Go documentation
Creation of your own Elasticsearch cluster, populated with a set of rodent-based pages including for our friendly Gopher, from Wikipedia:

Connecting to Elasticsearch

In our examples, we will make use of the Typed API offered by the Go client. Establish a secure connection for any query requires configuring the client using either:

Cloud ID and API key if making use of Elastic Cloud.
Cluster URL, username, password and the certificate.

Connecting to our cluster located on Elastic Cloud would look like this:

func GetElasticsearchClient() (*elasticsearch.TypedClient, error) {
	var cloudID = os.Getenv("ELASTIC_CLOUD_ID")
	var apiKey = os.Getenv("ELASTIC_API_KEY")

	var es, err = elasticsearch.NewTypedClient(elasticsearch.Config{
		CloudID: cloudID,
		APIKey:  apiKey,
		Logger:  &elastictransport.ColorLogger{os.Stdout, true, true},
	})

	if err != nil {
		return nil, fmt.Errorf("unable to connect: %w", err)
	}

	return es, nil
}

The client connection can then be used for searching, as shown later.

Keyword search

Keyword search is the foundational search type that we have been familiar with since the inception of Archie, the first documented internet search engine written in 1990.

A central component of keyword search is the translation of documents into an inverted index. Exactly like the index found at the back of a textbook, an inverted index contains a mapping between a list of tokens and their location in each document. The below diagram shows the key stages of generating the index:

As shown above, the generation of tokens in Elasticsearch comprises three key stages:

Stripping of unnecessary characters via zero or more char_filters. In our example we are stripping out HTML elements within the body_content field via the html_strip filter.
Splitting the tokens from the content with the standard tokenizer, which will split by spacing and key punctuation.
Removing unwanted tokens or transforming tokens from the output stream of the tokenizer using zero or more filter options, such as the lowercase token filter or stemmers such as the snowball stemmer to transform tokens back to their language root.

Searching in Elasticsearch with Go

When querying with the Go client, we specify the index we want to search and pass in the query and other options, just like in the below example:

func KeywordSearch(client *elasticsearch.TypedClient, term string) ([]Rodent, error) {
	res, err := client.Search().
		Index("search-rodents").
		Query(&types.Query{
			Match: map[string]types.MatchQuery{
				"title": {Query: term},
			},
		}).
		From(0).
		Size(10).
		Do(context.Background())

	if err != nil {
		return nil, fmt.Errorf("could not search for rodents: %w", err)
	}

	return getRodents(res.Hits.Hits)
}

In the above example, we perform a standard match query to find any document in our index that contains the specified string passed into our function. Note we pass a new empty context to the search execution via Do(context.Background()). Furthermore, any errors returned by Elasticsearch are output to the err attribute for logging and error handling.

Results are returned in res.Hits.Hits with the _Source attribute containing the document itself in a JSON format. To convert this source to a Go-friendly struct, we need to unmarshal the JSON response using the Go encoding/json package, as shown in the below example:

func getRodents(hits []types.Hit) ([]Rodent, error) {
	var rodents []Rodent

	for _, hit := range hits {
		var currentRodent Rodent
		err := json.Unmarshal(hit.Source_, &currentRodent)

		if err != nil {
			return nil, fmt.Errorf("an error occurred while unmarshaling rodent %s: %w", hit.Id_, err)
		}

		currentRodent.ID = hit.Id_
		rodents = append(rodents, currentRodent)
	}

	return rodents, nil
}

Searching and unmarshalling the query gopher will return the Wikipedia page for Gopher as expected:

[
  {ID:64f74ecd4acb3df024d91112 Title:Gopher - Wikipedia Url:https://en.wikipedia.org/wiki/Gopher}
]

However, if we ask What do Gophers eat? we don't quite get the results we want:

[]

A simple keyword search allows results returned to your Go application in a performant way that works in a way we are familiar with from the applications we use. It also works great for exact term matches that are relevant for scenarios such as looking for a particular company or term.

However, as we see above, it struggles to identify context and semantics due to the vocabulary mismatch problem. Furthermore, support for non-text file formats such as images and audio is challenging.

Conclusions

Here we've discussed how to perform traditional text queries in Elasticsearch using the Elasticsearch Go client. Given Go is widely used for infrastructure scripting and building web servers, it's useful to know how to search in Go.

Check out the GitHub repo for all the code in this series. Follow on to part 2 to gain an overview of vector search and how to perform vector search in Go. Until then, happy gopher hunting!

Resources

Ready to build RAG into your apps? Want to try different LLMs with a vector database?
Check out our sample notebooks for LangChain, Cohere and more on Github, and join the Elasticsearch Engineer training starting soon!