Search Liteedit

There are two forms of the search API: a “lite” query-string version that expects all its parameters to be passed in the query string, and the full request body version that expects a JSON request body and uses a rich search language called the query DSL.

The query-string search is useful for running ad hoc queries from the command line. For instance, this query finds all documents of type tweet that contain the word elasticsearch in the tweet field:

GET /_all/tweet/_search?q=tweet:elasticsearch

The next query looks for john in the name field and mary in the tweet field. The actual query is just

+name:john +tweet:mary

but the percent encoding needed for query-string parameters makes it appear more cryptic than it really is:

GET /_search?q=%2Bname%3Ajohn+%2Btweet%3Amary

The + prefix indicates conditions that must be satisfied for our query to match. Similarly a - prefix would indicate conditions that must not match. All conditions without a + or - are optional—​the more that match, the more relevant the document.

The _all Fieldedit

This simple search returns all documents that contain the word mary:

GET /_search?q=mary

In the previous examples, we searched for words in the tweet or name fields. However, the results from this query mention mary in three fields:

  • A user whose name is Mary
  • Six tweets by Mary
  • One tweet directed at @mary

How has Elasticsearch managed to find results in three different fields?

When you index a document, Elasticsearch takes the string values of all of its fields and concatenates them into one big string, which it indexes as the special _all field. For example, when we index this document:

{
    "tweet":    "However did I manage before Elasticsearch?",
    "date":     "2014-09-14",
    "name":     "Mary Jones",
    "user_id":  1
}

it’s as if we had added an extra field called _all with this value:

"However did I manage before Elasticsearch? 2014-09-14 Mary Jones 1"

The query-string search uses the _all field unless another field name has been specified.

The _all field is a useful feature while you are getting started with a new application. Later, you will find that you have more control over your search results if you query specific fields instead of the _all field. When the _all field is no longer useful to you, you can disable it, as explained in Metadata: _all Field.

More Complicated Queriesedit

The next query searches for tweets, using the following criteria:

  • The name field contains mary or john
  • The date is greater than 2014-09-10
  • The _all field contains either of the words aggregations or geo
+name:(mary john) +date:>2014-09-10 +(aggregations geo)

As a properly encoded query string, this looks like the slightly less readable result:

?q=%2Bname%3A(mary+john)+%2Bdate%3A%3E2014-09-10+%2B(aggregations+geo)

As you can see from the preceding examples, this lite query-string search is surprisingly powerful. Its query syntax, which is explained in detail in the Query String Syntax reference docs, allows us to express quite complex queries succinctly. This makes it great for throwaway queries from the command line or during development.

However, you can also see that its terseness can make it cryptic and difficult to debug. And it’s fragile—​a slight syntax error in the query string, such as a misplaced -, :, /, or ", and it will return an error instead of results.

Finally, the query-string search allows any user to run potentially slow, heavy queries on any field in your index, possibly exposing private information or even bringing your cluster to its knees!

For these reasons, we don’t recommend exposing query-string searches directly to your users, unless they are power users who can be trusted with your data and with your cluster.

Instead, in production we usually rely on the full-featured request body search API, which does all of this, plus a lot more. Before we get there, though, we first need to take a look at how our data is indexed in Elasticsearch.