Loading

General Scout best practices

For test-type-specific guidance, see UI test best practices and API test best practices.

Scout is deployment-agnostic: write once, run locally and on Elastic Cloud.

  • Every suite must have deployment tags. Use tags to target the environments where your tests apply (for example, a feature that only exists in stateful deployments).
  • Within a test, avoid relying on configuration, data, or behavior specific to a single deployment. Test logic should produce the same result locally and on Cloud.
  • Run your tests against a real Elastic Cloud project before merging to catch environment-specific surprises early. See Run tests on Elastic Cloud for setup instructions.

A test should live in the plugin or package that owns the code it exercises. When writing or reviewing a test, confirm that the scenarios logically belong to the plugin they were added to:

  • API tests: the routes under test should be defined in this plugin's /server directory.
  • UI tests: the UI being driven should come from this plugin's /public directory — a quick look there is usually enough to understand what the plugin renders and whether the test fits.

This also keeps Scout's selective testing effective: it runs only the tests for modules affected by a PR, so a test placed in the wrong plugin won't be triggered by changes to the code it actually covers. The full suite still runs post-merge on kibana-on-merge.

When a feature is gated behind a flag, enable it at runtime with apiServices.core.settings() rather than creating a custom server config. Runtime flags work locally and on Cloud, don’t require a server restart, and avoid the CI cost of a dedicated server instance.

For the full guide (including when a custom server config is unavoidable), see Feature flags.

When you add new tests, fix flakes, or make significant changes, run the same tests multiple times to catch flakiness early. A good starting point is 20–50 runs.

Prefer doing this locally first (faster feedback), and use the Flaky Test Runner in CI when needed. See Debug flaky tests for guidance.

  • Keep one top-level suite per file (test.describe).
  • Avoid nested describe blocks. Use test.step for structure inside a test.
  • Don’t rely on test file execution order (it’s not guaranteed).
  • Don’t assume a previous test in the suite already set up the data you need (if that test fails or is skipped, the test will break with a misleading error).

Use test.step() (or apiTest.step() in API tests) to structure a multi-step flow within a single test. It keeps the test in one context (faster, clearer reporting) and produces labelled entries in the test report that make failures easier to diagnose. Group closely related actions into a single step when it keeps the report readable without hiding intent.

Test names should read like a sentence describing expected behavior. Clear names make failures self-explanatory and test suites scannable.

Prefer “one role + one flow per file” and keep spec files small (roughly 4–5 short tests or 2–3 longer ones). The test runner balances work at the spec-file level, so oversized files become bottlenecks during parallel execution. Put shared login/navigation in beforeEach.

If many files share the same “one-time” work (archives, API calls, settings), move it to a global setup hook.

Note

Global setup hooks have no corresponding teardown. Keep operations that require cleanup (such as kbnClient.importExport.load()) in beforeAll/afterAll hooks so saved objects are properly removed after tests run. See Global setup hook: When to use for guidance.

It’s common for test suites to load Elasticsearch or Kibana archives that are barely used (or not used at all). Unused archives slow down setup, waste resources, and make it harder to understand what a test actually depends on. Check if your tests ingest the data they actually need.

Use esArchiver.loadIfNeeded(), which skips ingestion if the index already exists (useful when multiple suites share the same data).

Warning

loadIfNeeded() checks at the index level, not individual documents. If a test deletes specific documents, subsequent runs or retries won't restore them. Reindex documents that were deleted.

Cleanup in the test body doesn’t run after a failure. Prefer afterEach / afterAll. Don’t duplicate the same teardown in the test body when a hook already runs it; duplication invites unnecessary try/catch and drift between paths.

Tests should be clean and declarative. If a helper might return an expected error (for example, 404 during cleanup), the helper should handle it internally, for example by accepting an ignoreErrors option or treating a 404 during deletion as a success.

When a test verifies multiple independent items (KPI tiles, chart counts, table columns, several response fields), you can optionally use expect.soft() so the test continues checking everything instead of stopping at the first failure (to facilitate troubleshooting). Playwright still fails the test at the end if any soft assertion failed.

If a value is reused across suites (archive paths, fixed time ranges, endpoints, common headers), extract it into a shared constants.ts file. This reduces duplication and typos, and makes updates safer.

Avoid admin unless there’s no alternative. Minimal permissions catch real permission bugs and keep tests realistic. Also test the forbidden path: verify that an under-privileged role receives 403 for endpoints it shouldn’t access.

See browser authentication and API authentication.

If you build a helper that will benefit other tests, consider upstreaming it:

  • Reusable across many plugins/teams: contribute to @kbn/scout
  • Reusable but solution-scoped: contribute to the relevant solution Scout package
  • Plugin-specific: keep it in your plugin’s test/scout tree

For the full guidance, see Scout.

Tip Keep Scout packages package and plugin-agnostic

When you move a helper into @kbn/scout or a solution Scout package, don't import types from plugins or plugin-scoped packages. Scout packages are intentionally slim, shared infrastructure — adding a dependency on a specific plugin's types pulls that plugin into every consumer and breaks the sharing model.