Website search tutorialedit

This document guides you through an implementation of website search using the web crawler and Elastic App Search.

For a quicker introduction, see Website search quickstart.

This guide focuses on a concrete use case: building a search experience for a website. For this example, we chose the Elastic blog at https://www.elastic.co/blog.

There are many ways to implement website search, but it usually involves the following steps:

  1. Get your website content into Elasticsearch and keep it up to date.
  2. Integrate search into your website UI.
  3. Improve your search results and continually evaluate the performance of your search results.

In this guide, we’ll accomplish the above using the following tools:

  1. Index content into Elasticsearch: We will use the web crawler UI to crawl the website, ingest webpage content, to be stored as documents in an Elasticsearch index.
  2. Search content: We will create an App Search engine based on the Elasticsearch index. This will allow us to use App Search’s out-of-the-box capabilities to generate a Search UI.
  3. Optimize search results We will use relevance tuning to optimize the search results.
  4. Analytics and insights: We will use App Search analytics to refine and improve the search experience with synonyms and curations.

Follow the instructions to crawl the Elastic blog. Once you’re comfortable with the steps involved, use this guide as a blueprint for your own website search use cases.

Table of contents:

Prerequisitesedit

To use Enterprise Search features you need a subscription, a deployment, and a user. You get all three when you start an Elastic Cloud trial.

Create an Elastic Cloud deploymentedit

This step assumes you are a new Elastic Cloud user. Skip this step if your team already has an Elastic Cloud deployment.

Start by signing up for a free Elastic Cloud trial. After creating an account, you’ll have an active subscription, and you’ll be prompted to create your first deployment.

Follow the steps to Create a new deployment. For more details, refer to Create a deployment in the Elastic Cloud documentation.

The Elastic web crawler was introduced in Enterprise Search 8.4.0, so be sure to use version 8.4.0 or later. The web crawler is not available at all Elastic subscription levels, but free trial users have access to all features.

Ingestion mechanism: the web crawleredit

We will use the web crawler UI to extract and transform webpage content ​into Elasticsearch documents.

Create an Elasticsearch indexedit

The crawler will store the indexed content in a search-optimized Elasticsearch index.

When you create a new deployment, you will be taken to a landing page with a number of options. Select Search my data to be taken directly to the Search overview page, where you can create an Elasticsearch index. Alternatively, in the main menu navigate to SearchContentIndices.

Follow these steps in Kibana to create your index:

  • Select Create an Elasticsearch index.
  • Choose Use the web crawler as your ingestion method.
  • Name your new index, for example elastic-blog.
  • Choose your document language. We’ll use English in this example.

You are now ready to add domains.

When you create a search-optimized Elasticsearch index using a Search ingestion method the index name is automatically prefixed with search-.

In this example your index will be named:search-elastic-blog

Add domain and entry pointsedit

We now need to add our domain, which will be validated by the crawler. Follow these steps to add your domain and entry points:

  • When you create your index you will be automatically navigated to Manage domains for that index.

    To navigate there manually go to Search > Content > Indices > elastic-blog> Manage domains.

  • Enter the domain to be crawled: https://www.elastic.co. The web crawler will not crawl any webpages outside of the defined domains.
  • Review any warnings flagged by the domain validation process. For this exercise, you can ignore any indexing restrictions that are reported.
  • Add the domain.
  • Under Entry points, select Edit and append /blog to the domain URL: https://www.elastic.co/blog/.

The crawler now knows where to start crawling.

Add crawl rulesedit

Use crawl rules to restrict which domain paths get crawled. We’ll set crawl rules to disallow any pages that aren’t part of the blog.

For this example we want to focus on blog post titles that contain the term elasticsearch. We don’t need all pages under https://www.elastic.co/blog, so we’ll also disallow any path that begins with /blog/author and /blog/category.

Add the following crawl rules:

  • Policy Disallow, rule Regex, path pattern .*
  • Policy Allow, rule Begins with, path pattern /blog/
  • Policy Allow, rule Contains, path pattern /blog/*elasticsearch
  • Policy Disallow, rule Begins with, path pattern /blog/author
  • Policy Disallow, rule Begins with, path pattern /blog/category

Here’s what the crawl rules should look like in the Kibana UI:

The list of crawl rules

Rules are evaluated in sequential order. Therefore, order matters. You can drag and drop the rows to adjust as needed. Don’t worry about the final rule in the list. This default rule simply allows all pages in the domain to be searched.

We’re ready to start crawling.

Launch and monitor crawledit

It’s time to launch your crawl. This should take around 8 minutes.

  1. Select Start a crawl > Crawl all domains on this engine.
  2. Browse the Documents tab for your index to check that pages are being indexed. Each record includes a document’s unique ID, a list of indexed fields, and the field contents.
  3. Next, from the Kibana menu, go to Observability > Logs to monitor the web crawling activity as a live stream.
  4. Search for indexed. You should see entries for pages in the /blog/ directory that were successfully crawled and ingested into your deployment.

    For example: Indexed the document into Elasticsearch with doc_id=622827583d8203857b45e77b

  5. Search for denied. An entry appears for each page that was skipped, and the log message indicates the crawl rule by which the page was excluded. The bulk of these entries will look like this:

    Denied by crawl rule: domain=https://www.elastic.co policy=deny rule=regex pattern=.*

    These log entries help you fine tune your crawling rules, to only include relevant website content in your search engine experience.

It may take a few test crawls to get things exactly right. If a crawl finishes very quickly, without indexing any documents, your crawl rules are too restrictive. Verify your domains and entry points, and check that crawl rules are ordered correctly.

Build your search experienceedit

When the crawl has successfully completed, and we’re happy that the crawler has indexed the content we want, it’s time to build our search experience.

We will create an App Search engine from our elastic-blog index. This will allow us to quickly generate, monitor, and refine a search experience. We’ll then add some relevance tuning to optimize search results for our blog content.

Create an App Search engineedit

First, we need to create an App Search engine. To create an engine, choose your search engine type and configure it:

  1. Open Search > App Search > Engines.
  2. Select Create engine.
  3. Select Elasticsearch index-based.
  4. Name your engine.
  5. Select the Elasticsearch index you’d like to use. This will be our search-elastic-blog index.

Add relevance tuningedit

App Search allows you to add relevance tuning to your engine. This ensures users get the best possible results from their search queries.

  1. Open the Relevance Tuning page.
  2. In the Preview pane, type in a search term, for example "security".
  3. Check the search results. If you expect that users will find the results with the word security in the title field more important, then consider adjusting the weight of that field.
  4. In Manage fields, find the title and open the collapsed section.
  5. Use the slider to adjust the weight level while watching the search results. Notice that as the weight of title is increased the results change.
  6. Use the Save and Restore buttons at the top of the Relevance Tuning page to save or undo your changes.

Configure your search experienceedit

The query results are optimized, so it’s now time to set up our search experience.

Search UI is a React library maintained by Elastic that allows you to quickly implement search experiences. We’ll use App Search’s built-in Search UI page to bootstrap a Search UI based on our engine in a few clicks.

You can see a live preview of your search experience within App Search. You can also download the generated source code, as a blueprint for your own development.

To set up the search experience:

  1. Open the Search UI page.
  2. In the Filter fields menu, select headings.
  3. In the Sort fields menu, select title.
  4. Leave all other fields at their defaults and select Generate search experience. This will open a complete search UI based on your indexed documents.
  5. Test out a few queries, and note some of the powerful search features, all of which you can customize for your users:

    • Search results can be sorted by titles or by any other field.
    • Results can be filtered into configurable buckets.
    • Typeahead query suggestions help guide users to the most effective search terms.
    • Queries with truncated or misspelled terms still produce highly relevant results.

You should now have a fully customizable search engine experience. Try a few queries to test it out.

You might want to download the source code and run your search UI locally. To do so, follow the instructions in the optional next step.

(Optional) Download and run Search UI package locallyedit

You’ll need to have node.js and npm installed for this optional step.

Follow these steps to download and run the search UI locally:

  1. On the generated search experience page, select Download.
  2. Save and then open up the package. The src/config/engine.json file contains all of the preset configuration settings, and setting options are listed in the README file.
  3. Open a terminal window and cd into the package directory.
  4. Run npm install to set everything up.
  5. Run npm start to launch the application.

If you run into any problems, refer to the Search UI or App Search Troubleshooting documentation.

Once you’re comfortable running your Search UI locally, you can think about deploying to production environments.

Improve search results using analyticsedit

When people start using your search engine, you can use analytics to refine and improve their experience. This is actionable data.

Open App Search > Engines > your_engine_name > Analytics to see how users are interacting with your search experience. You’ll see a cumulative table and graph showing:

  1. Total queries: what people are searching for the most.
  2. Queries with no results: users are looking for something on your site and not finding it.
  3. Total clicks: what people are clicking on the most.

Use analytics to continually optimize search results to match user needs.

Synonyms and curationsedit

Based on these analytics, you may choose to refine your relevance tuning, or add synonym sets and curations to your search engine.

Synonyms allow you to group queries that have the same meaning in your dataset. They are most useful when you know the precise terms users are searching for.

Example:

You can see from the analytics that the query golang is not returning results. You understand that these users are interested in blogs related to the Go programming language. Create a synonym set that relates the terms go and golang. This ensures that the term golang produces relevant results, such as blogposts about the Elasticsearch Go client.

To create a synonym set follow these steps:

  1. Navigate to App Search > Engines > your_engine_name > Synonyms.
  2. Select Create a Synonym Set.
  3. Add your synonyms.
  4. Select Add value to add as many synonyms as you need.

To provide even more precise and curated results, use curations.

Curations allow you to customize search results for specific queries. Use promoted documents to guarantee certain documents match a query and receive the highest relevance scores.

Example:

People might be using the generic query "latest" when searching the blog content. However, this query is vague and matches too many documents. You can tell from the analytics that users are not clicking on many pages when they see the results for this query. Use a curation to ensure that the blog post about the latest Elastic release is returned as the first result.

To add a curation for the term latest follow these steps:

  1. Navigate to App Search > Engines > your_engine_name > Curations.
  2. Select Create a curation.
  3. Enter the query you want to curate, in this case the term latest.
  4. The UI will display the top organic results for that query. Star documents from the organic results, or select Add a result manually.

Curations also allow you to hide results you’d prefer users were not directed to. In this example, you might want to hide blog posts that discuss an older version of the Elastic stack, which your team doesn’t use.

Summary and next stepsedit

If you followed along with this guide, you’ve learned how to:

  • Use the Elastic web crawler to index webpage content into an Elasticsearch index
  • Create an App Search engine based on that index
  • Configure a refined Search UI experience to search over the content
  • Use analytics to understand how users interact with the search results
  • Use synonyms and curations to improve the search experience

Learn more: