Enterprise Search server known issues

edit

Enterprise Search server known issuesedit

The following known issues affect Enterprise Search server.

Enterprise Search server does not provide a way to configure trusted certificates when interacting with 3rd party services that use a self-signed certificate authority.

This applies to all versions of Enterprise Search server.
Enterprise Search server does not support cross-cluster replication.

This applies to all versions of Enterprise Search server.
Restoring a snapshot to another deployment on Elastic Cloud has limitations due to encryption keys.

For details, see Restoring a snapshot across deployments in the Elastic Cloud documentation.

This applies to all versions of Enterprise Search server.
Resolving locks that were left behind in Enterprise Search.

Enterprise Search has a locking mechanism that prevents concurrent writes or index migrations by multiple Enterprise Search instances to its system indices (.ent-search*). .ent-search-db-lock-<date> is the name of the system index that manages these locks.

We have observed multiple issues in the past where Enterprise Search was leaving locks behind, preventing Enterprise Search instances from starting properly or performing its normal operations.

The current Enterprise Search locking mechanism is sensitive to underlying Elasticsearch performance issues. For smaller Elasticsearch deployments with instances (⇐ 8GB RAM), please check if the CPU credits have been depleted, as this can lead to the cluster struggling under the current load. As a result, Enterprise Search can have difficulties with optimistic versioning, leading to locks not getting released correctly.

The following are a few observed symptoms:
1. After an upgrade of Enterprise Search on Elastic Cloud, instances are failing to start up:
```
{
  "instance-0000000002": "Instance [instance-0000000002] failed to start after [4] attempts due to ['Instance exited unexpectedly', 'Unknown failure, boot logs not analyzed', 'Instance exited with exit code [0]']"
}
```
  The Enterprise Search logs indicate that it is waiting for another instance to release a lock:
```
Waiting on another instance to release the "installation" lock: {"product_version"=>"8.10.3", "locked_at"=>"2023-10-16T15:03:56+00:00", "expires_at"=>"2023-10-16T15:07:35+00:00", "last_heartbeat_at"=>"2023-10-16T15:05:35+00:00", "status_update_at"=>"2023-10-16T15:03:09+00:00", "last_status"=>"[Failed] Ensuring migrations tracking index exists: Error = Faraday::TimeoutError: Read timed out", "node_name"=>"22ac384e1a01", "pid"=>21, "tid"=>4004}
```
2. Enterprise Search instances go unhealthy intermittently. For example, in a 3 instance Enterprise Search deployment, sometimes only one went unhealthy, then a couple minutes after, another one went unhealthy, then it all became healthy again and unhealthy again, etc..
3. A version conflict error was observed in the Enterprise Search logs:
```
version conflict, document already exists (current version [1948]) index '.ent-search-db-lock-20200304' (Swiftype::es::VersionConflictEngineException)
```
  If you encounter the symptoms above, you can use the workaround below to recover:
  1. Stop all instances of Enterprise Search
  2. Take a (successful) full snapshot of the Elasticsearch cluster as backup.
  3. Delete all records in the latest .ent-search-db-lock-<date> index.
  4. Restart the instances one at a time, i.e. start the first Enterprise Search instance and wait for it to complete startup, then the second Enterprise Search instance and wait for it to complete startup, and so on …
    
    This applies to all versions of the Enterprise Search server.

« Migrating from 8.7 to 8.8 Enterprise Search server troubleshooting »