Dell has deployed two Elasticsearch clusters on Windows servers in Dell data centers. The Dell Search Platform is based on .NET framework. One is a search cluster that powers the search experience on Dell.com, and the other is an analytics cluster used to track search-related user activity on the site. The analytics cluster provides an ability to deliver a crowd sourced and influenced search results and also provides great insight into the usage of search platform.
The Dell Search Cluster
The Dell search cluster contains an extremely comprehensive data set as it indexes everything on Dell.com, consisting of over 27 million documents which include all the products that can be purchased on the site, all the drivers for these products that can be downloaded, troubleshooting articles, knowledge-base documents, product manuals, videos and video metadata – just to name a few.
The documents themselves are also extremely rich. For example, the product documents include all the information related to that particular product: the product title, its description, the image link, keywords, meta information for the technical specifications of these products (RAM size, processor type, resolution, etc), stock status so they know how many days it will take to ship the product, pricing information, department category, and more.
The Dell Analytics Cluster
The Dell analytics cluster, which is currently more than 1 billion documents, indexes every click on Dell.com that comes from a search experience. Dell uses this data to analyze the top-performing queries, the top performing categories, and various other metrics to perform actionable, dynamic improvements to the site – whether it be the relevancy of the search results by influencing popular products higher, or serving the results from the right category based on a visitor's query.
Dell's Linguistics Pipelines
In order to deliver accurate search results in all languages, Dell created extensive linguistic pipelines for each language. The pipelines utilize Elasticsearch's language analyzers, stopword removal, spell check, synonym match, stemming, and other features to make the query more accurate. Dell also added a final step at the end of their linguistic pipelines that they call a catch-all influencer, which is essentially an offline aggregator that helps identify the entities from the query the customer entered. This aggregator runs across multiple systems, such as the content management system and their master lookup tables across various databases, and, depending on what the customer queried for in the search bar, maps the product category to the product category code, the manufacturer name to the manufacturer code, and so on and so forth. These inputs, enriched with analytics and customer identification data, are then passed to a probability engine and helps Dell re-write the final query. This context helps Dell significantly understand what the user is expecting when they perform a search.