Inside a Shard

edit

In Life Inside a Cluster, we introduced the shard, and described it as a low-level worker unit. But what exactly is a shard and how does it work? In this chapter, we answer these questions:

  • Why is search near real-time?
  • Why are document CRUD (create-read-update-delete) operations real-time?
  • How does Elasticsearch ensure that the changes you make are durable, that they won’t be lost if there is a power failure?
  • Why does deleting documents not free up space immediately?
  • What do the refresh, flush, and optimize APIs do, and when should you use them?

The easiest way to understand how a shard functions today is to start with a history lesson. We will look at the problems that needed to be solved in order to provide a distributed durable data store with near real-time search and analytics.