Tech Topics

ActiveRecord to Repository: Changing Persistence Patterns with the Elasticsearch Rails Gem

One of the Elasticsearch Rails integration gems provides a persistence layer for Ruby domain objects. Up through the 5.x series of this gem, elasticsearch-persistence, users could choose between the ActiveRecord and Repository patterns. With elasticsearch-persistence 6.0, the ActiveRecord pattern has been deprecated and removed. We realize that this means some of our users will have to invest additional time migrating their applications, but we are convinced it will pay off in the long-run.

If you have an existing app using the gem's ActiveRecord pattern and want to upgrade to 6.0, this post is for you. If you are starting a new app with the Repository pattern, you may also find this guide useful.

Reason for Deprecation

While the ActiveRecord pattern of the elasticsearch-persistence gem was attractive for easily transitioning from a Rails app backed by a relational database to one using Elasticsearch, it introduces technical and conceptual difficulties with time. Rails was originally written to be used with a relational database and its semantics are at odds with an inherently non-relational storage option like Elasticsearch. The GitHub issues we've seen opened over the years for the elasticsearch-persistence gem are largely due to a difference between what ActiveRecord semantics promise and what a non-relational storage option like Elasticsearch can provide.

For this reason, we encourage our users to decouple domain objects from the persistence layer access code with the Repository pattern. With this pattern, users can define their Ruby objects as models and keep persistence code and rich Elasticsearch queries in a separate, repository class. The features of Elasticsearch go well beyond what a relational database provides and can be used easily in a Repository class. Using the Repository pattern frees up your code to make the most of Elasticsearch without the structure and schema constraints of an ActiveRecord model definition.

Example App: Music

The 5.x branch of the elasticsearch-persistence repo provided a Rails template for an app demonstrating use of the ActiveRecord pattern. The app has the models, Artist and Album and they were persisted in the same index using the join datatype. While it's completely valid to employ this data type, the example migration is easier to follow if the app persists artists and albums in separate indices. Please see the end of this article for a few notes on the join data type.

We updated the base app to persist artist and albums in separate indices with the association: an artist can have many albums and an album belongs to a single artist. Then we migrated the app to use the Repository pattern in a series of commits. This app is demonstrative in nature and is far from feature-rich, but the goal of this guide is to illustrate and document the changes necessary to migrate an app from one using the ActiveRecord pattern to one using the Repository pattern.

In the guide below you will find:

  • Explanations organized into various components of the Rails app
  • Checklists serving as a reference for your migration
  • In-depth explanations for significant changes
  • Code snippets taken from the reference commits

References:

Repository Classes

Checklist:

  1. Include Elasticsearch::Persistence::Repository
  2. Include Elasticsearch::Persistence::Repository::DSL (if class-level configuration is needed)
  3. Define document_type, index_name, klass on an instance or at the class-level with the DSL module. We recommend using the default document_type '_doc'.*
  4. Define mappings on a repository instance or at the class-level with the DSL module
  5. Define a #deserialize method for handling raw hashes returned from Elasticsearch queries. If the index contains documents corresponding to multiple model types, handle instantiation routing in this method
  6. Define methods for running custom, frequently-used queries
  7. Define explicit #save methods if certain options (e.g. routing) need to be used
  8. Write tests
  • The ability to define a document type is deprecated and will be removed in future versions of Elasticsearch.

Naming

class ArtistRepository
end

Give your Repository class a name that associates it with the models it will be responsible for querying, serializing, and deserializing. Typically, there is a 1:1 mapping between a repository class and an Elasticsearch index. If each of your models is persisted in different indices, you would have one repository class for each model.

Include mixin(s)

Mixin(s) - Example Code

class ArtistRepository
  include Elasticsearch::Persistence::Repository
  include Elasticsearch::Persistence::Repository::DSL
end

Include the Elasticsearch::Persistence::Repository module. This will enrich the class with methods and provide access to a client, used to make requests. If you'd like to set configurations for the repository at the class-level, include the Elasticsearch::Persistence::Repository::DSL module. All instances of the repository class will then use the class-level configurations as a default. The settings can always be overridden at instantiation via an options argument.

Document Type, Index Name and klass

class ArtistRepository
...
  index_name 'artists'
  klass Artist
...
end

Define the document_type. With Elasticsearch 6.x, multiple types in a single index are no longer supported, and in future versions of Elasticsearch, the API will default to a document "type" of '_doc'. Therefore, for forward compatibility, the document_type of the objects persisted via the ArtistRepository is left as the default, '_doc', and you might want to do the same. Also, define the index name and the class (klass) that should be used to instantiate a new object from a document returned from Elasticsearch.

Mappings

Mappings - Example Code

class ArtistRepository
...
  mapping do
    indexes :name, analyzed_and_raw
    indexes :members, analyzed_and_raw
    indexes :profile
    indexes :members_combined, { analyzer: 'snowball' }
    indexes :artist_suggest, {
      type: 'object',
      properties: {
        name: { type: 'completion' },
        members: { type: 'completion' }
      }
    }
  end
...
end

Define mappings for the ArtistRepository and AlbumRepository. These mappings will be applied when #create_index! is called on the repository. You may want to define a rake task for creating each of your application's indices with their respective mappings.

Deserialization

Deserializations - Example Code

class ArtistRepository
...
  def deserialize(document)
    artist = super
    artist.id = document['_id']
    artist
  end
...
end

Define a #deserialize method on the ArtistRepository and AlbumRepository. Note that we must set the id field on the instantiated objects so that the id attribute is properly accessible for each model object.

Query Definitions

Query Definitions - Example Code

class ArtistRepository
...
  def all(options = {})
    search({ query: { match_all: { } } },
           { sort: 'name.raw' }.merge(options))
  end
...
end

Next, define queries that are common in your application. For example, we define #all for both the ArtistRepository and AlbumRepository.

Custom Queries

Custom Queries - Example Code

class AlbumRepository
...
  def albums_by_artist(artist)
    search(query: { match: { artist_id: artist.id } })
  end

  def album_count_by_artist(artist)
    count(query: { match: { artist_id: artist.id } })
  end
...
end

We know that our application will also need to execute some custom queries repeatedly so we will make them methods on the repository classes. We would otherwise need to execute them via the #search method. For example, we'll need to retrieve the album documents given a particular artist, and retrieve the count of albums, given a particular artist. We can define these methods on the AlbumRepository class. We'll also define the body used for a suggest request for the AlbumRepository and ArtistRepository.

Tests

Repository Tests - Example Code

It's important we ensure that our persistence and search methods are working properly, so we'll add tests for each of the repositories. We'll test that the correct mapping is used to create an index, that artists and albums are correctly serialized and deserialized, and that our custom queries execute as expected.

Models: Artist model, Album model

Checklist:

  1. Remove Elasticsearch::Persistence::Model module
  2. Include Elasticsearch::Model (if necessary)
  3. Include ActiveModel modules, if needed, e.g.: a. ActiveModel::Naming b. ActiveModel::Model c. ActiveModel::Validations
  4. Define Validations, either custom Validators or simple validations available via ActiveModel::Validations
  5. Explicitly define associations as attributes
  6. Define defaults for attributes
  7. Add id as an explicit attribute
  8. Define #to_hash a. If custom logic is needed b. If there are associations whose id we need to delete from the persisted representation of the document
  9. Define #persisted? method, if necessary. Form helpers sometimes rely on it
  10. Update tests

Artist Model

Artist Model - Example Code

class Artist
  include ActiveModel::Model
  include ActiveModel::Validations
end

We'll still have an Artist model defined in our app, but it's no longer persisted via methods on the instances themselves with the ActiveRecord pattern. We'll include some other mixins to maintain certain functionalities, like validations. In the reference commit, we remove the Elasticsearch::Persistence::Model module and include the following modules:

  • ActiveModel::Model: supplies some methods necessary for form helpers
  • ActiveModel::Validations: provides error message caching and validation methods

Mappings

Artist Mappings - Example Code

Remove the mapping option passed to the attribute methods, as this was used by Elasticsearch::Persistence::Model to construct the mapping document sent when creating an index. Instead, define the #mapping on the ArtistRepository using the Elasticsearch::Persistence::Repository::DSL module. Note that this is used by the ArtistRepository when #create_index! is called.

Validations

Artist Validations - Example Code

class Artist
...
  validates :name, presence: true
...
end


class ArtistRepository
...
  def serialize(artist)
    artist.validate!
    artist.to_hash.tap do |hash|
      suggest = { name: { input: [ hash[:name] ] } }
      if hash[:members].present?
        suggest[:members] = { input: hash[:members].collect(&:strip) } 
      end
    hash.merge!(:artist_suggest => suggest)
    end
  end
...
end

We can still define validations on the Artist model if we include ActiveModel::Validations. However #validate! must be called explicitly at persistence time. Because the repository is responsible for persisting the objects, we called #to_hash on the artist objects in the ArtistRepository #serialize method and put the call to #validate! there.

Separation of Domain Object Logic and Persistence Logic

Clean up the models by removing methods that are now called on the repositories instead. These methods are those that make requests to Elasticsearch, as we want to keep all interactions with the persistence layer in the repository classes. Also remove anything relating to index configuration or creation in the models.

Custom Methods

Artist Custom Methods - Example Code

class Artist
...
  def persisted?
    !!id
  end
...
end

Some view helpers rely on a #persisted? method on the model object being available, so define one explicitly.

Tests

Artist Tests - Example Code

Album Model

Album Model - Example Code

The Album model gets the same makeover as the Artist model. The first step is to remove the Elasticsearch::Persistence::Model mixin and include other necessary mixins. See the first step in the Artist model migration for details.

Mapping

Define a mapping for the Album model using Elasticsearch::Model::DSL on the AlbumRepository and remove methods related to defining and creating an index.

Associations and Validations

Album Associations and Validations - Example Code

class Album
  class Validator < ActiveModel::Validator

    ERROR_MESSAGE = 'An album must be associated with an artist.'.freeze

    def validate(album)
      unless album.title && album.artist && album.artist.persisted?
        album.errors.add(:base, ERROR_MESSAGE)
      end
    end
  end
end

We want to require that an album object be associated with an artist before it is persisted. Doing so requires slightly complex logic so we extract the code into a custom Validator. As we did with the Artist model, call #validate! explicitly in the #serialize method on the album repository.

Separation of Domain Object Logic and Persistence logic

Remove methods relating to persistence that should be called on the AlbumRepository instance instead.

Id Attribute

The id attribute is not automatically assumed or handled for either the artist or album model, so we add an explicit id attribute.

Tests

Album Tests - Example Code

Suggester

Checklist:

  1. Change all methods called on model objects relating to persistence and search to use the repository object instead

Suggester - Example Code

class Suggester
...
  def execute!(*repositories)
    @responses ||= []
    repositories.each do |repository|
      @responses << begin
        repository.client.search(index: repository.index_name,
                                 body: repository.suggest_body(@term))
      end
    end
  end
...
end

Our suggester object should use a repository when doing custom queries instead of the Artist model.

Tests

Suggester Tests - Example Code

We also update the Suggester tests to use a repository instead.

Rails Initializer

Checklist:

  1. Decide where to define the repository object a. In a controller? b. In an initializer?

Rails Initializer - Example Code

There are a number of ways to define the repository object(s) used in the app. They can be set it up in the initializer as constants or global variables. Alternatively, they can be created as instance variables in each controller when requests are handled. You should consider how expensive it is to instantiate repository objects when choosing which method to use.

Controllers: Artists controller, Albums controller

Checklist:

  1. Change all methods called on model objects relating to persistence to use the repository object instead
  2. Extract the id from the response returned after calling #save on the repository. Set it on the newly-persisted model object

Controllers - Example Code

class ArtistsController < ApplicationController
...
  def index
    @artists = $artist_repository.all(sort: 'name.raw')
  end
...
end

The controllers are updated to use the repositories. When calling #save on the model object with the ActiveRecord pattern, the document with its new id was returned as a whole entity. With the repository object, only the id of the indexed document is returned, so ensure that this is handled appropriately. For example, we assign the new id to the newly persisted artist object so that it's considered persisted by the form helper.

Alternatively, you can define a custom #save method on the repository class that sets the id.

URLs and Routes

No updates needed!

Views

Checklist:

  1. Update any queries using model objects to use a repository instead

Views - Example Code

class ArtistsController < ApplicationController
...
  def show
    @albums = $album_repository.albums_by_artist(@artist)
  end
...
end

Where we once relied on model methods to access associations, we'll now change the code to use the repository to do object retrieval. For example, we use our custom #albums_by_artist method on the album repository to retrieve the albums by a given artist. We must make other changes in the artists and search views to use the repository instead of the object instances.

Important Notes

Defining Attributes on POROs (Plain Old Ruby Objects)

There are a number of gems that allow us to define attributes with types on POROs — essentially allowing us to define a schema for Ruby objects directly in their model files. We can choose one of these libraries and use it, or we can handle our attributes explicitly with custom code. The base app using the ActiveRecord pattern was written at a time when Vitrus was a popular gem for defining attributes on Ruby objects. Since then, other gems have gained popularity in the community.

In order to demonstrate that we don't necessarily need to depend on another gem to attain attribute functionality, we handle the model attributes explicitly. You are still free to use whatever gem you'd like to enrich your PORO.

The Elasticsearch Join Datatype

The join datatype is a special field that creates parent/child relations within documents in the same index. Using the join datatype with the has_parent query, or the has_child query adds significant overhead to query performance. That said, the join datatype can be used as a more complex schema. It is required to index the lineage of a parent in the same shard as a child so we must always route child documents using their parent id. If you persist parent and child documents in a single index, ensure that the routing values are taken into account when both saving and retrieving child documents.

Wrapping Up / Reaching Out

We hope that this guide has been useful for you whether you are starting a new app using the Repository pattern or migrating an existing one from the ActiveRecord pattern. If you have any questions, don't hesitate to reach out to us via an issue in the elasticsearch-rails repo.