One Big Useredit

Big, popular forums start out as small forums. One day we will find that one shard in our shared index is doing a lot more work than the other shards, because it holds the documents for a forum that has become very popular. That forum now needs its own index.

The index aliases that we’re using to fake an index per user give us a clean migration path for the big forum.

The first step is to create a new index dedicated to the forum, and with the appropriate number of shards to allow for expected growth:

PUT /baking_v1
{
  "settings": {
    "number_of_shards": 3
  }
}

The next step is to migrate the data from the shared index into the dedicated index, which can be done using scan-and-scroll and the bulk API. As soon as the migration is finished, the index alias can be updated to point to the new index:

POST /_aliases
{
  "actions": [
    { "remove": { "alias": "baking", "index": "forums"    }},
    { "add":    { "alias": "baking", "index": "baking_v1" }}
  ]
}

Updating the alias is atomic; it’s like throwing a switch. Your application continues talking to the baking API and is completely unaware that it now points to a new dedicated index.

The dedicated index no longer needs the filter or the routing values. We can just rely on the default sharding that Elasticsearch does using each document’s _id field.

The last step is to remove the old documents from the shared index, which can be done with a delete-by-query request, using the original routing value and forum ID:

DELETE /forums/post/_query?routing=baking
{
  "query": {
    "term": {
      "forum_id": "baking"
    }
  }
}

The beauty of this index-per-user model is that it allows you to reduce resources, keeping costs low, while still giving you the flexibility to scale out when necessary, and with zero downtime.