Presenting Play: A Preview of an Elasticsearch Playground
UPDATE: This article refers to our hosted Elasticsearch offering by an older name, Found. Please note that Found is now known as Elastic Cloud.
To develop the perfect search, you need to structure the documents correctly, configure the right text processing, wire up mappings and do some searches to see how well it works.
This is typically an iterative process. “Oh, it would be great if I could search like this. But that requires a change to that analyzer, and tweaking the mappings a bit, and [etc., etc.…]”
While changing mappings and analyzers for an existing index is a lot of work with plain Elasticsearch, Play makes it easy by doing those things for you. Change an analyzer configuration, run, and a split second later you can see how it affects your searches. It also assists you with configuring mappings and searches as well, with context aware autocompletion and documentation.
When you play, it creates the indexes with the right mappings and analyzers for you, indexes all your sample documents and then runs all the sample searches. It can do this very quickly because we impose one limitation: you cannot have a huge set of sample documents.
Note: Play is neither feature complete nor bug free. This is a preview to show where we’re headed, and to get some early feedback. Expect bugs!
Play’s user interface is structured into five editor tabs.
In addition to an Overview tab, there are supplementary tabs designed to optimize the screen estate for the task at hand. They are different views on the same data, and are kept in sync.
- The Documents tab uses most of the screen estate on the editor for sample documents, with the rest spent on showing the results.
- Analysis shows an editor for configuring analyzers, tokenizers and so on, with a view that shows how text is processed step by step.
- Mappings uses most of the space for the mappings editor, with some space left for documentation for the type of mapping you’re doing, a small view that shows how text is processed for the currently selected field, and lastly, the resulting mapping.
- Searches uses a lot of space for editing searches and for showing results. There’s also a view for showing documentation for the search/filter/facet/parameter your cursor is at.
Play prefers YAML over JSON. YAML is easier to read and edit (by humans), and you can comment it.
While JSON is valid YAML, to get the most out of Play (i.e. context-aware autocompletion and documentation), you should use the supported subset of YAML. This is documented in Play’s help section.
In Can I Run it on My Cluster? we describe how you can export things as JSON, though!
Before we look further into the various features of Play, let’s start out with a simple Hello World.
In this exercise we want to index three documents and run two searches.
- First, open Play in a separate tab or window. Click the “Clear”-button in the top right corner to clear out the introductory text.
- Paste this text in the “Documents” editor, i.e. the top left editor:
<span class="co"># This is the first document</span> <span class="fu">quote:</span> Man had always assumed that he was more intelligent than dolphins because he had achieved so much - the wheel, New York, wars and so on - whilst all the dolphins had ever done was muck about in the water having a good time. But conversely, the dolphins had always believed that they were far more intelligent than man - for precisely the same reasons. <span class="fu">author:</span> Douglas Adams <span class="ot">---</span> <span class="co"># This is the second document. The three dashes (---) separate them.</span> <span class="fu">quote:</span> The ships hung in the sky in much the same way that bricks don't. <span class="fu">author:</span> Douglas Adams <span class="ot">---</span> <span class="co"># And the third...</span> <span class="fu">quote:</span> Winter is coming. <span class="fu">author:</span> George R. R. Martin
- Paste this into the “Searches” editor, i.e. the top right editor:
<span class="co"># This is the first search</span> <span class="fu">query:</span> <span class="fu">match:</span> <span class="fu">quote:</span> dolphins <span class="ot">---</span> <span class="co"># Second search.</span> <span class="fu">facets:</span> <span class="fu">words:</span> <span class="fu">terms:</span> <span class="fu">field:</span> author
- Click on “Run” in the top right menu, or press Ctrl+Enter to run the play.
- Results appear in the bottom right window. There’s one tab for the resulting mappings, and one for each search.
While Play is neither all-encompassing nor feature complete, we have spent a lot of time making sure we can add context aware autocompletion, documentation and linting.
The various editors know where your cursor is, and whether the cursor is in a filter in a query in a facet. They already suggest many things:
- Most of the search structures, like query, filter and facet DSLs.
- Fields and types - both when mapping, modifying sample documents and specifying them as parameters in queries, facets or filters.
- Available analyzers, when configuring mappings and searches.
We will eventually teach Play to autocomplete everything, including suggesters, aggregations, etc.
Knowing the location and the context of the cursor is also used e.g. to highlight the results for the search you are currently working on, or show how existing text in your sample documents is currently being tokenized when editing mappings.
Furthermore, we want to be able to highlight warnings and errors as they happen. The knowledge base is not comprehensive for the time being, though. You will get a little warning if you try to do inefficient filters, such as when using a top level filter without any facets. Or when you want to be using a
bool filter and not an
and as explained in Zachary Tong’s article on filter bitsets.
As explained in our article on Elasticsearch from the bottom up, getting the text processing right is a very important part of working with search.
To make it easy to work with analyzers, tokenizers, token filters and so on, we’ve made an analysis editor that shows step by step how text is processed.
The image below shows how the input e.g. “John Smith” is first tokenized into
[John, Smith] by the
standard-tokenizer before each term is subsequently filtered by the
double_metaphone-filter. Tokens are displayed such that overlapping tokens will be displayed as such. You can also hover them to see other overlapping tokens.
For a dive into analyzers, here’s a great read: All About Analyzers, Part One
Here’s exactly what happens when you run/execute a Play: An API-request consisting of all sample documents, searches, analyzers and mappings is sent to a backend running on Found’s servers. This backend uses a pool of in-memory Elasticsearch servers, and for every request …
- Creates index templates with the correct analysis and mappings configurations.
- Indexes all the documents. If a document causes the creating of an index, the index is created after the template made in the previous step.
- Gets the resulting mapping, which is a combination of what Elasticsearch guesses and what’s specified by the user.
- Runs all the searches and gets the results.
- Deletes all the indexes.
This all happens in memory and usually takes just a few tenths of a second.
We are committed to open source Play, with the same license used by Elasticsearch, i.e. APLv2.
That said, the Play environment is not something you’d want to have running on a live cluster. The “create lots of indexes and then delete them” process causes lots of changes to the cluster state, and it will not be very fast when things need to hit disk.
However, the search view with mapping aware autocompletion and documentation is very useful on a live cluster. We intend to provide that functionality as a separate plugin that you can run on your cluster.
What you can do is export the resulting state of Play to a live cluster, using the save/export-pane. That’ll provide you with a shell-script you can run.
There are some things you cannot do with Play, at least not at present:
- Dynamic scripts are disabled.
- The sum of all documents, searches, mappings and analyzers cannot exceed 2 MiB.
- Shards and custom routing cannot be specified at this time.
We have many plans for Play. This is merely an initial preview release. There are known bugs, but with the amount of “When can I get to use it?!” feedback we have received, we wanted to make a preview available sooner rather than later.
As mentioned, Play will be available as an open source project. There’s some work ahead to decouple it from internal tools and some general cleanups that needs to happen before we can publish it.
Having said that, here’s a few things we plan to do:
- Integrate the official documentation, both for the search, view and mapping views.
- Add more autocompletion, such as suggesters, decay scoring, and the upcoming aggregation feature.
- Make things commentable. We’re considering using gist’s commenting feature for this.
- Enable viewing of a gist’s history and loading an old version right from Play.
- Refactor the search view and turn it into an Elasticsearch plugin you can use on your own cluster.
- Collaboration with TogetherJS