- A special field for storing content that you don’t want to include in output
events. For example, the
@metadatafield is useful for creating transient fields for use in conditional statements.
- administration console
- A component of Elastic Cloud Enterprise that provides the API server for the Cloud UI. Also syncs cluster and allocator data from ZooKeeper to Elasticsearch.
- Manages hosts that contain Elasticsearch and Kibana nodes. Controls the lifecycle of these nodes by creating new containers and managing the nodes within these containers when requested. Used to scale the capacity of your Elastic Cloud Enterprise installation.
Analysis is the process of converting full text to terms. Depending on which analyzer is used, these phrases:
foo,barwill probably all result in the terms
bar. These terms are what is actually stored in the index.
A full text query (not a term query) for
FoO:bARwill also be analyzed to the terms
barand will thus match the terms stored in the index.
It is this process of analysis (both at index time and at search time) that allows Elasticsearch to perform full text queries.
- availability zone
- Contains resources available to a Elastic Cloud Enterprise installation that are isolated from other availability zones to safeguard against failure. Could be a rack, a server zone or some other logical constraint that creates a failure boundary. In a highly available cluster, the nodes of a cluster are spread across two or three availability zones to ensure that the cluster can survive the failure of an entire availability zone. Also see Fault Tolerance (High Availability).
- beats runner
- Used to send Filebeat and Metricbeat information to the logging cluster.
- The machine learning features use the concept of a bucket to divide the time series into batches for processing. The bucket span is part of the configuration information for a job. It defines the time interval that is used to summarize and model the data. This is typically between 5 minutes to 1 hour and it depends on your data characteristics. When you set the bucket span, take into account the granularity at which you want to analyze, the frequency of the input data, the typical duration of the anomalies, and the frequency at which alerting is required.
- client forwarder
- Used for secure internal communications between various components of Elastic Cloud Enterprise and ZooKeeper.
- Cloud UI
- Provides web-based access to manage your Elastic Cloud Enterprise installation, supported by the administration console.
- A cluster consists of one or more nodes which share the same cluster name. Each cluster has a single master node which is chosen automatically by the cluster and which can be replaced if the current master node fails.
- codec plugin
- A Logstash plugin that changes the data representation of an event. Codecs are essentially stream filters that can operate as part of an input or output. Codecs enable you to separate the transport of messages from the serialization process. Popular codecs include json, msgpack, and plain (text).
- A control flow that executes certain actions based on whether a statement
(also called a condition) is true or false. Logstash supports
else if, and
elsestatements. You can use conditional statements to apply filters and send events to a specific output based on conditions that you specify.
- Directs allocators to manage containers of Elasticsearch and Kibana nodes and maximizes the utilization of allocators. Monitors plan change requests from the Cloud UI and determines how to transform the existing cluster. In a highly available installation, places cluster nodes within different availability zones to ensure that the cluster can survive the failure of an entire availability zone.
- Includes an instance of Elastic Cloud Enterprise software and its dependencies. Used to provision similar environments, to assign a guaranteed share of host resources to nodes, and to simplify operational effort in Elastic Cloud Enterprise.
- Consists of a logical grouping of some Elastic Cloud Enterprise services and acts as a distributed coordination system and resource scheduler.
- cross-cluster replication (CCR)
- The cross-cluster replication feature enables you to replicate indices in remote clusters to your local cluster. For more information, see Cross-cluster replication.
- cross-cluster search (CCS)
- The cross-cluster search feature enables any node to act as a federated client across multiple clusters. See Cross-cluster search.
- Machine learning jobs can analyze either a one-off batch of data or continuously in real time. Datafeeds retrieve data from Elasticsearch for analysis. Alternatively you can post data from any source directly to a machine learning API.
- As part of the configuration information that is associated with a machine learning job, detectors define the type of analysis that needs to be done. They also specify which fields to analyze. You can have more than one detector in a job, which is more efficient than running multiple jobs against the same data.
- Manages the ZooKeeper datastore. This role is often shared with the coordinator, though in production deployments it can be separated.
A document is a JSON object (also known in other languages as a hash / hashmap / associative array) which contains zero or more fields, or key-value pairs.
The original JSON document that is indexed will be stored in the
_sourcefield, which is returned by default when getting or searching for a document.
- Elastic Common Schema (ECS)
- A document schema for Elasticsearch, for use cases such as logging and metrics. ECS defines a common set of fields, their datatype, and gives guidance on their correct usage. ECS is used to improve uniformity of event data coming from different sources.
- A single unit of information, containing a timestamp plus additional data. An event arrives via an input, and is subsequently parsed, timestamped, and passed through the Logstash pipeline.
A document contains a list of fields, or key-value pairs. The value can be a simple (scalar) value (for example, a string, integer, date), or a nested structure like an array or an object. A field is similar to a column in a table in a relational database.
The mapping for each field has a field type (not to be confused with document type) which indicates the type of data that can be stored in that field, eg
object. The mapping also allows you to define (amongst other things) how the value for a field should be analyzed.
In Logstash, this term refers to an event property. For example, each event in an apache access log has properties, such as a status code (200, 404), request path ("/", "index.html"), HTTP verb (GET, POST), client IP address, and so on. Logstash uses the term "fields" to refer to these properties.
- field reference
- A reference to an event field. This reference may appear in
an output block or filter block in the Logstash config file. Field references
are typically wrapped in square (
) brackets, for example
[fieldname]. If you are referring to a top-level field, you can omit the
and simply use the field name. To refer to a nested field, you specify the full path to that field:
[top-level field][nested field].
- A filter is a non-scoring query, meaning that it does not score documents. It is only concerned about answering the question - "Does this document match?". The answer is always a simple, binary yes or no. This kind of query is said to be made in a filter context, hence it is called a filter. Filters are simple checks for set inclusion or exclusion. In most cases, the goal of filtering is to reduce the number of documents that have to be examined.
- filter plugin
- A Logstash plugin that performs intermediary processing on an event. Typically, filters act upon event data after it has been ingested via inputs, by mutating, enriching, and/or modifying the data according to configuration rules. Filters are often applied conditionally depending on the characteristics of the event. Popular filter plugins include grok, mutate, drop, clone, and geoip. Filter stages are optional.
- follower index
- Follower indices are the target indices for cross-cluster replication. They exist in your local cluster and replicate leader indices.
- A self-contained package of code that’s hosted on RubyGems.org. Logstash plugins are packaged as Ruby Gems. You can use the Logstash plugin manager to manage Logstash gems.
- hot thread
- A Java thread that has high CPU usage and executes for a longer than normal period of time.
- The ID of a document identifies a document. The
index/idof a document must be unique. If no ID is provided, then it will be auto-generated. (Also see routing).
- A Logstash instance that is tasked with interfacing with an Elasticsearch cluster in order to index event data.
- input plugin
- A Logstash plugin that reads event data from a specific source. Input plugins are the first stage in the Logstash event processing pipeline. Popular input plugins include file, syslog, redis, and beats.
- Machine learning jobs contain the configuration information and metadata necessary to perform an analytics task.
- leader index
- Leader indices are the source indices for cross-cluster replication. They exist on remote clusters and are replicated to follower indices.
- machine learning node
- A machine learning node is a node that has
true, which is the default behavior. If you set
false, the node can service API requests but it cannot run jobs. If you want to use machine learning features, there must be at least one machine learning node in your cluster.
A mapping can either be defined explicitly, or it will be generated automatically when a document is indexed.
- master node
- Handles write requests for the cluster and publishes changes to other nodes in an ordered fashion. Each cluster has a single master node which is chosen automatically by the cluster and is replaced if the current master node fails. Also see node.
- The combining of Lucene segments, either automatically in the background or initiated using force merge.
- message broker
- Also referred to as a message buffer or message queue, a message broker is external software (such as Redis, Kafka, or RabbitMQ) that stores messages from the Logstash shipper instance as an intermediate store, waiting to be processed by the Logstash indexer instance.
A node is a running instance of Elasticsearch or Kibana which belongs to a cluster. Multiple nodes can be started on a single server for testing purposes, but usually you should have one node per server.
At startup, a node will use unicast to discover an existing cluster with the same cluster name and will try to join that cluster.
- output plugin
- A Logstash plugin that writes event data to a specific destination. Outputs are the final stage in the event pipeline. Popular output plugins include elasticsearch, file, graphite, and statsd.
- A term used to describe the flow of events through the Logstash workflow. A pipeline typically consists of a series of input, filter, and output stages. Input stages get data from a source and generate events, filter stages, which are optional, modify the event data, and output stages write the data to a destination. Inputs and outputs support codecs that enable you to encode or decode the data as it enters or exits the pipeline without having to use a separate filter.
- Specifies the configuration and topology of an Elasticsearch or Kibana cluster, such as capacity, availability, and Elasticsearch version, for example. When changing a plan, the constructor determines how to transform the existing cluster into the pending plan.
- A self-contained software package that implements one of the stages in the Logstash event processing pipeline. The list of available plugins includes input plugins, output plugins, codec plugins, and filter plugins. The plugins are implemented as Ruby gems and hosted on RubyGems.org. You define the stages of an event processing pipeline by configuring plugins.
- plugin manager
- Accessed via the
bin/logstash-pluginscript, the plugin manager enables you to manage the lifecycle of plugins in your Logstash deployment. You can install, remove, and upgrade plugins by using the plugin manager Command Line Interface (CLI).
- primary shard
You cannot change the number of primary shards in an index, once the index is created.
See also routing.
- A highly available, TLS-enabled proxy layer that routes user requests, mapping cluster IDs that are passed in request URLs for the container to the cluster nodes handling the user requests.
A request for information from Elasticsearch. You can think of a query as a question, written in a way Elasticsearch understands. A search consists of one or more queries combined.
There are two types of queries: scoring queries and filters. For more information about query types, see Query and filter context.
The process of syncing a shard copy from a source shard. Upon completion, the recovery process makes the shard copy available for queries.
Recovery automatically occurs anytime a shard moves to a different node in the same cluster, including:
- Node startup
- Node failure
- Index shard replication
- Snapshot restoration
- To cycle through some or all documents in one or more indices, re-writing them into the same or new index in a local or remote cluster. This is most commonly done to update mappings, or to upgrade Elasticsearch between two incompatible index versions.
- replica shard
Each primary shard can have zero or more replicas. A replica is a copy of the primary shard, and has two purposes:
- increase failover: a replica shard can be promoted to a primary shard if the primary fails
increase performance: get and search requests can be handled by primary or replica shards.
By default, each primary shard has one replica, but the number of replicas can be changed dynamically on an existing index. A replica shard will never be started on the same node as its primary shard.
- roles token
- Enables a host to join an existing Elastic Cloud Enterprise installation and grants permission to hosts to hold certain roles, such as the allocator role. Used when installing Elastic Cloud Enterprise on additional hosts, a roles token helps secure Elastic Cloud Enterprise by making sure that only authorized hosts become part of the installation.
When you index a document, it is stored on a single primary shard. That shard is chosen by hashing the
routingvalue. By default, the
routingvalue is derived from the ID of the document or, if the document has a specified parent document, from the ID of the parent document (to ensure that child and parent documents are stored on the same shard).
- A local control agent that runs on all hosts, used to deploy local containers based on role definitions. Ensures that containers assigned to it exist and are able to run, and creates or recreates the containers if necessary.
- services forwarder
- Routes data internally in an Elastic Cloud Enterprise installation.
Other than defining the number of primary and replica shards that an index should have, you never need to refer to shards directly. Instead, your code should deal only with an index.
- An instance of Logstash that send events to another instance of Logstash, or some other application.
- To reduce the amount of shards in an index. See the shrink index API.
- source field
- By default, the JSON document that you index will be stored in the
_sourcefield and will be returned by all get and search requests. This allows you access to the original object directly from search results, rather than requiring a second step to retrieve the object from an ID.
- To grow the amount of shards in an index. See the split index API.
- Securely tunnels all traffic in an Elastic Cloud Enterprise installation.
- A term is an exact value that is indexed in Elasticsearch. The terms
FOOare NOT equivalent. Terms (i.e. exact values) can be searched for using term queries. See also text and analysis.
Text fields need to be analyzed at index time in order to be searchable as full text, and keywords in full text queries must be analyzed at search time to produce (and search for) the same terms that were generated at index time.
- A type used to represent the type of document, e.g. an
user, or a
tweet. Types are deprecated and are in the process of being removed. See Removal of mapping types.
- The filter thread model used by Logstash, where each worker receives an event and applies all filters, in order, before emitting the event to the output queue. This allows scalability across CPUs because many filters are CPU intensive.
- A coordination service for distributed systems used by Elastic Cloud Enterprise to store the state of the installation. Responsible for discovery of hosts, resource allocation, leader election after failure and high priority notifications.