One Big Useredit

Big, popular forums start out as small forums. One day we will find that one shard in our shared index is doing a lot more work than the other shards, because it holds the documents for a forum that has become very popular. That forum now needs its own index.

The index aliases that we’re using to fake an index per user give us a clean migration path for the big forum.

The first step is to create a new index dedicated to the forum, and with the appropriate number of shards to allow for expected growth:

PUT /baking_v1
{
  "settings": {
    "number_of_shards": 3
  }
}

The next step is to migrate the data from the shared index into the dedicated index, which can be done using a scroll query and the bulk API. As soon as the migration is finished, the index alias can be updated to point to the new index:

POST /_aliases
{
  "actions": [
    { "remove": { "alias": "baking", "index": "forums"    }},
    { "add":    { "alias": "baking", "index": "baking_v1" }}
  ]
}

Updating the alias is atomic; it’s like throwing a switch. Your application continues talking to the baking API and is completely unaware that it now points to a new dedicated index.

The dedicated index no longer needs the filter or the routing values. We can just rely on the default sharding that Elasticsearch does using each document’s _id field.

The last step is to remove the old documents from the shared index, which can be done by searching using the original routing value and forum ID and performing a bulk delete.

The beauty of this index-per-user model is that it allows you to reduce resources, keeping costs low, while still giving you the flexibility to scale out when necessary, and with zero downtime.