DC Thomson’s Journey with Elastic
DC Thomson is a private company and one of the leading media organisations in the UK, headquartered in Dundee, Scotland. The company publishes newspapers, magazines and books and has diversi ed into new media, digital technology, retail, radio and television through investment interests. They are known for publishing newspapers including The Press and Journal, The Courier, and the Sunday Post. Their magazine stable includes titles such as The People’s Friend, The Beano and My Weekly.
DC Thomson is dealing with massive data sets from their various sites that used to run on Solr. At the time the digital team was facing a lot of technical and scalability issues, which is when Solr was switched on and weighed against Elasticsearch. The majority of the newspaper sites were using Solr for search, however, the following characteristics made Elasticsearch the right choice for them:
- Easy to set up
- Handles massive data sets
- Growing client base, including Stack Overflow, GitHub, Facebook, and the Guardian
- Not just for search – it’s a technology stack (the Elastic Stack)
- “Plug and play” approach – adding additional nodes made easy
- Distributed indexing – faster and more reliable
- Data Import plugins for more data sources (e.g., MongoDB, Twitter Streams, or Wordpress)
- Range of APIs (PHP, .NET, Java, etc.)
- Feature rich (ES REST API)
Below is an overview of the architectural setup at DC Thomson. In the middle, there is the actual Elasticsearch cluster with three nodes, which are to be expanded in the near future. The environment on the left consists of different data sources (Wordpress, Magento, and Scot Ads) that directly communicate with the Elasticsearch cluster via the bridge. To make sure all Elasticsearch clusters are healthy and running, DC Thomson is using X-Pack for monitoring.
Currently Elasticsearch is used to index the content of four of their major sites — energyvoice.com, pressandjournal.co.uk, eveningexpress.co.uk and eveningtelegraph.co.uk — and to provide related articles, which increases page views and advert revenue. The ultimate goal is to migrate most of the news sites to Elasticsearch and off of the 3rd party CMS they run on now (Polopoly) and put them on the Wordpress cluster. The plugin for Wordpress works with Elasticsearch, so with that migration they will automatically be running on Elasticsearch.
Ever since the change from Solr to Elasticsearch, DC Thomson has seen a great increase in pageviews by up to 40% and that is mainly a result of the added ‘related article’ function, which helps to serve up readers with related and relevant content. By adding class names to the related article links, DC Thomson is able to track the clicks. Even after 2 weeks of having Elasticsearch in place, they have been seeing an increase in page views, which helped hugely to justify that they’ve chosen the right solution.
Furthermore the bounce rate decreased, the number of pages per session has gone up and the general user experience is much faster and smoother than before. All of this benefited DC Thomson hugely as it showed that they can monetize Elastic given that they have a paywall on the Press and Journal and Energy Voice sites. It is important for their business to move people around their sites and feed them with the information they are looking for and Elastic plays a crucial part in this.
Another use case that DC Thomson has on their road map is to create an editorial backend for journalists that would allow them to investigate previous articles and render the homepages to see how popular certain articles are. With the ELK stack the digital director can show immediate results, which is what the board wants to see and hear.
Being able to monitor page views will allow journalists to curate content and tell the user what the best stories are by implementing certain trending and most read, tech widgets on the site. Ultimately the goal for DC Thomson is to dynamically show the most popular articles (most reads, social shares etc.) on the homepage. Finally, getting insights into the reader’s data (location, behaviour on the site etc.) will further allow DC Thomson to see who they really want to target with a personalised experience when visiting the news page (“These are the articles you should be interested in”).