Preview the extracted features used by a data frame analytics config. ##Required authorization
- Cluster privileges:
monitor_ml
Body
-
Hide config attributes Show config attributes object
-
Hide source attributes Show source attributes object
-
An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
External documentation Hide query attributes Show query attributes object
-
Hide bool attributes Show bool attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
-
Hide boosting attributes Show boosting attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Floating point number between 0 and 1.0 used to decrease the relevance scores of documents matching the
negativequery. -
An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
-
An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
-
-
Hide combined_fields attributes Show combined_fields attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
List of fields to search. Field wildcard patterns are allowed. Only
textfields are supported, and they must all have the same searchanalyzer. -
Text to search for in the provided
fields. Thecombined_fieldsquery analyzes the provided text before performing a search. -
If true, match phrase queries are automatically created for multi-term synonyms.
-
Values are
ororand. -
Values are
noneorall.
-
-
Hide constant_score attributes Show constant_score attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
-
-
Hide dis_max attributes Show dis_max attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
One or more query clauses. Returned documents must match one or more of these queries. If a document matches multiple queries, Elasticsearch uses the highest relevance score.
An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
-
Floating point number between 0 and 1.0 used to increase the relevance scores of documents matching multiple query clauses.
-
-
Hide exists attributes Show exists attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
-
Hide function_score attributes Show function_score attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Values are
multiply,replace,sum,avg,max, ormin. -
One or more functions that compute a new score for each document returned by the query.
-
Restricts the new score to not exceed the provided limit.
-
Excludes documents that do not meet the provided score threshold.
-
An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
-
Values are
multiply,sum,avg,first,max, ormin.
-
-
Returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.
External documentation -
Hide geo_bounding_box attributes Show geo_bounding_box attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Values are
memoryorindexed. -
Values are
coerce,ignore_malformed, orstrict. -
Set to
trueto ignore an unmapped field and not match any documents for this query. Set tofalseto throw an exception if the field is not mapped.
-
-
Hide geo_distance attributes Show geo_distance attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Values are
arcorplane. -
Values are
coerce,ignore_malformed, orstrict. -
Set to
trueto ignore an unmapped field and not match any documents for this query. Set tofalseto throw an exception if the field is not mapped.
-
-
Matches
geo_pointandgeo_shapevalues that intersect a grid cell from a GeoGrid aggregation. -
Hide geo_polygon attributes Show geo_polygon attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Values are
coerce,ignore_malformed, orstrict.
-
-
Hide geo_shape attributes Show geo_shape attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Set to
trueto ignore an unmapped field and not match any documents for this query. Set tofalseto throw an exception if the field is not mapped.
-
-
Hide has_child attributes Show has_child attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Indicates whether to ignore an unmapped
typeand not return any documents instead of an error. -
Hide inner_hits attributes Show inner_hits attributes object
-
The maximum number of hits to return per
inner_hits. -
Inner hit starting document offset.
-
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
Maximum number of child documents that match the query allowed for a returned parent document. If the parent document exceeds this limit, it is excluded from the search results.
-
Minimum number of child documents that match the query required to match the query for a returned parent document. If the parent document does not meet this limit, it is excluded from the search results.
-
An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
-
Values are
none,avg,sum,max, ormin.
-
-
Hide has_parent attributes Show has_parent attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Indicates whether to ignore an unmapped
parent_typeand not return any documents instead of an error. You can use this parameter to query multiple indices that may not contain theparent_type. -
Hide inner_hits attributes Show inner_hits attributes object
-
The maximum number of hits to return per
inner_hits. -
Inner hit starting document offset.
-
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
-
Indicates whether the relevance score of a matching parent document is aggregated into its child documents.
-
-
Hide ids attributes Show ids attributes object
-
Returns documents based on the order and proximity of matching terms.
External documentation -
Hide knn attributes Show knn attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
The number of nearest neighbor candidates to consider per shard
-
The final number of nearest neighbors to return as top hits
-
The minimum similarity for a vector to be considered a match
-
-
Returns documents that match a provided text, number, date or boolean value. The provided text is analyzed before matching.
External documentation -
Hide match_all attributes Show match_all attributes object
-
Analyzes its input and constructs a
boolquery from the terms. Each term except the last is used in atermquery. The last term is used in a prefix query.External documentation -
Hide match_none attributes Show match_none attributes object
-
Analyzes the text and creates a phrase query out of the analyzed text.
External documentation -
Returns documents that contain the words of a provided text, in the same order as provided. The last term of the provided text is treated as a prefix, matching any words that begin with that term.
External documentation -
Hide more_like_this attributes Show more_like_this attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
The analyzer that is used to analyze the free form text. Defaults to the analyzer associated with the first field in fields.
External documentation -
Each term in the formed query could be further boosted by their tf-idf score. This sets the boost factor to use when using this feature. Defaults to deactivated (0).
-
Controls whether the query should fail (throw an exception) if any of the specified fields are not of the supported types (
textorkeyword). -
A list of fields to fetch and analyze the text from. Defaults to the
index.query.default_fieldindex setting, which has a default value of*. -
Specifies whether the input documents should also be included in the search results returned.
-
The maximum document frequency above which the terms are ignored from the input document.
-
The maximum number of query terms that can be selected.
-
The maximum word length above which the terms are ignored. Defaults to unbounded (
0). -
The minimum document frequency below which the terms are ignored from the input document.
-
The minimum term frequency below which the terms are ignored from the input document.
-
The minimum word length below which the terms are ignored.
stop_words
string | array[string] Language value, such as arabic or thai. Defaults to english. Each language value corresponds to a predefined list of stop words in Lucene. See Stop words by language for supported language values and their stop words. Also accepts an array of stop words.
One of: Values are
_arabic_,_armenian_,_basque_,_bengali_,_brazilian_,_bulgarian_,_catalan_,_cjk_,_czech_,_danish_,_dutch_,_english_,_estonian_,_finnish_,_french_,_galician_,_german_,_greek_,_hindi_,_hungarian_,_indonesian_,_irish_,_italian_,_latvian_,_lithuanian_,_norwegian_,_persian_,_portuguese_,_romanian_,_russian_,_serbian_,_sorani_,_spanish_,_swedish_,_thai_,_turkish_, or_none_.-
Values are
internal,external,external_gte, orforce.
-
-
Hide multi_match attributes Show multi_match attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Analyzer used to convert the text in the query value into tokens.
-
If
true, match phrase queries are automatically created for multi-term synonyms. -
If
true, edits for fuzzy matching include transpositions of two adjacent characters (for example,abtoba). Can be applied to the term subqueries constructed for all terms but the final term. -
If
true, format-based errors, such as providing a text query value for a numeric field, are ignored. -
Maximum number of terms to which the query will expand.
-
Values are
and,AND,or, orOR. -
Number of beginning characters left unchanged for fuzzy matching.
-
Text, number, boolean value or date you wish to find in the provided field.
-
Maximum number of positions allowed between matching tokens.
-
Determines how scores for each per-term blended query and scores across groups are combined.
-
Values are
best_fields,most_fields,cross_fields,phrase,phrase_prefix, orbool_prefix. -
Values are
allornone.
-
-
Hide nested attributes Show nested attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Indicates whether to ignore an unmapped path and not return any documents instead of an error.
-
Hide inner_hits attributes Show inner_hits attributes object
-
The maximum number of hits to return per
inner_hits. -
Inner hit starting document offset.
-
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
-
Values are
none,avg,sum,max, ormin.
-
-
Hide parent_id attributes Show parent_id attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Indicates whether to ignore an unmapped
typeand not return any documents instead of an error.
-
-
Hide percolate attributes Show percolate attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
The source of the document being percolated.
-
An array of sources of the documents being percolated.
-
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
The suffix used for the
_percolator_document_slotfield when multiplepercolatequeries are specified. -
Preference used to fetch document to percolate.
-
-
Hide pinned attributes Show pinned attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
-
Document IDs listed in the order they are to appear in results. Required if
docsis not specified. -
Documents listed in the order they are to appear in results. Required if
idsis not specified.
-
-
Returns documents that contain a specific prefix in a provided field.
External documentation -
Hide query_string attributes Show query_string attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
If
true, the wildcard characters*and?are allowed as the first character of the query string. -
Analyzer used to convert text in the query string into tokens.
-
If
true, the query attempts to analyze wildcard terms in the query string. -
If
true, match phrase queries are automatically created for multi-term synonyms. -
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
Values are
and,AND,or, orOR. -
If
true, enable position increments in queries constructed from aquery_stringsearch. -
Array of fields to search. Supports wildcards (
*). -
Maximum number of terms to which the query expands for fuzzy matching.
-
Number of beginning characters left unchanged for fuzzy matching.
-
If
true, edits for fuzzy matching include transpositions of two adjacent characters (for example,abtoba). -
If
true, format-based errors, such as providing a text value for a numeric field, are ignored. -
Maximum number of automaton states required for the query.
-
Maximum number of positions allowed between matching tokens for phrases.
-
Query string you wish to parse and use for search.
-
Analyzer used to convert quoted text in the query string into tokens. For quoted text, this parameter overrides the analyzer specified in the
analyzerparameter. -
Suffix appended to quoted text in the query string. You can use this suffix to use a different analysis method for exact matches.
-
How to combine the queries generated from the individual search terms in the resulting
dis_maxquery. -
Values are
best_fields,most_fields,cross_fields,phrase,phrase_prefix, orbool_prefix.
-
-
Returns documents that contain terms within a provided range.
External documentation -
Hide rank_feature attributes Show rank_feature attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
-
Returns documents that contain terms matching a regular expression.
External documentation -
Hide rule attributes Show rule attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
-
-
Hide script attributes Show script attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
-
Hide script_score attributes Show script_score attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Documents with a score lower than this floating point number are excluded from the search results.
-
An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
-
-
Hide semantic attributes Show semantic attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
The field to query, which must be a semantic_text field type
-
The query text
-
-
Hide shape attributes Show shape attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
When set to
truethe query ignores an unmapped field and will not match any documents.
-
-
Hide simple_query_string attributes Show simple_query_string attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Analyzer used to convert text in the query string into tokens.
-
If
true, the query attempts to analyze wildcard terms in the query string. -
If
true, the parser creates a match_phrase query for each multi-position token. -
Values are
and,AND,or, orOR. -
Array of fields you wish to search. Accepts wildcard expressions. You also can boost relevance scores for matches to particular fields using a caret (
^) notation. Defaults to theindex.query.default_field indexsetting, which has a default value of*. -
Maximum number of terms to which the query expands for fuzzy matching.
-
Number of beginning characters left unchanged for fuzzy matching.
-
If
true, edits for fuzzy matching include transpositions of two adjacent characters (for example,abtoba). -
If
true, format-based errors, such as providing a text value for a numeric field, are ignored. -
Query string in the simple query string syntax you wish to parse and use for search.
-
Suffix appended to quoted text in the query string.
-
-
Hide span_containing attributes Show span_containing attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
-
Hide span_field_masking attributes Show span_field_masking attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
-
Hide span_first attributes Show span_first attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Controls the maximum end position permitted in a match.
-
-
Hide span_multi attributes Show span_multi attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
-
-
Hide span_near attributes Show span_near attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Array of one or more other span type queries.
-
Controls whether matches are required to be in-order.
-
Controls the maximum number of intervening unmatched positions permitted.
-
-
Hide span_not attributes Show span_not attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
The number of tokens from within the include span that can’t have overlap with the exclude span. Equivalent to setting both
preandpost. -
The number of tokens after the include span that can’t have overlap with the exclude span.
-
The number of tokens before the include span that can’t have overlap with the exclude span.
-
-
Hide span_or attributes Show span_or attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Array of one or more other span type queries.
-
-
Matches spans containing a term.
External documentation -
Hide span_within attributes Show span_within attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
-
Hide sparse_vector attributes Show sparse_vector attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
The query text you want to use for search. If inference_id is specified, query must also be specified.
-
Whether to perform pruning, omitting the non-significant tokens from the query to improve query performance. If prune is true but the pruning_config is not specified, pruning will occur but default values will be used. Default: false
-
Dictionary of precomputed sparse vectors and their associated weights. Only one of inference_id or query_vector may be supplied in a request.
-
-
Returns documents that contain an exact term in a provided field. To return a document, the query term must exactly match the queried field's value, including whitespace and capitalization.
External documentation -
Hide terms attributes Show terms attributes object
-
Returns documents that contain a minimum number of exact terms in a provided field. To return a document, a required number of terms must exactly match the field values, including whitespace and capitalization.
External documentation -
Uses a natural language processing model to convert the query text into a list of token-weight pairs which are then used in a query against a sparse vector or rank features field.
External documentation -
Supports returning text_expansion query results by sending in precomputed tokens with the query.
External documentation -
Returns documents that contain terms matching a wildcard pattern.
External documentation -
Hide wrapper attributes Show wrapper attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
A base64 encoded query. The binary data format can be any of JSON, YAML, CBOR or SMILE encodings
-
-
Hide type attributes Show type attributes object
-
Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.
-
-
-
Hide runtime_mappings attribute Show runtime_mappings attribute object
-
Hide * attributes Show * attributes object
-
For type
composite -
For type
lookup -
A custom format for
datetype runtime fields. -
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
Hide script attributes Show script attributes object
-
Values are
boolean,composite,date,double,geo_point,geo_shape,ip,keyword,long, orlookup.
-
-
-
Hide _source attributes Show _source attributes object
-
An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.
-
An array of strings that defines the fields that will be included in the analysis.
-
-
Hide analysis attributes Show analysis attributes object
-
Hide classification attributes Show classification attributes object
-
Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This parameter affects loss calculations by acting as a multiplier of the tree depth. Higher alpha values result in shallower trees and faster training times. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to zero.
-
Defines which field of the document is to be predicted. It must match one of the fields in the index being used to train. If this field is missing from a document, then that document will not be used for training, but a prediction with the trained model will be generated for it. It is also known as continuous target variable. For classification analysis, the data type of the field must be numeric (
integer,short,long,byte), categorical (iporkeyword), orboolean. There must be no more than 30 different values in this field. For regression analysis, the data type of the field must be numeric. -
Advanced configuration option. Controls the fraction of data that is used to compute the derivatives of the loss function for tree training. A small value results in the use of a small fraction of the data. If this value is set to be less than 1, accuracy typically improves. However, too small a value may result in poor convergence for the ensemble and so require more trees. By default, this value is calculated during hyperparameter optimization. It must be greater than zero and less than or equal to 1.
-
Advanced configuration option. Specifies whether the training process should finish if it is not finding any better performing models. If disabled, the training process can take significantly longer and the chance of finding a better performing model is unremarkable.
-
Advanced configuration option. The shrinkage applied to the weights. Smaller values result in larger forests which have a better generalization error. However, larger forests cause slower training. By default, this value is calculated during hyperparameter optimization. It must be a value between 0.001 and 1.
-
Advanced configuration option. Specifies the rate at which
etaincreases for each new tree that is added to the forest. For example, a rate of 1.05 increasesetaby 5% for each extra tree. By default, this value is calculated during hyperparameter optimization. It must be between 0.5 and 2. -
Advanced configuration option. Defines the fraction of features that will be used when selecting a random bag for each candidate split. By default, this value is calculated during hyperparameter optimization.
-
Advanced configuration option. A collection of feature preprocessors that modify one or more included fields. The analysis uses the resulting one or more features instead of the original document field. However, these features are ephemeral; they are not stored in the destination index. Multiple
feature_processorsentries can refer to the same document fields. Automatic categorical feature encoding still occurs for the fields that are unprocessed by a custom processor or that have categorical values. Use this property only if you want to override the automatic feature encoding of the specified fields. -
Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies a linear penalty associated with the size of individual trees in the forest. A high gamma value causes training to prefer small trees. A small gamma value results in larger individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.
-
Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies an L2 regularization term which applies to leaf weights of the individual trees in the forest. A high lambda value causes training to favor small leaf weights. This behavior makes the prediction function smoother at the expense of potentially not being able to capture relevant relationships between the features and the dependent variable. A small lambda value results in large individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.
-
Advanced configuration option. A multiplier responsible for determining the maximum number of hyperparameter optimization steps in the Bayesian optimization procedure. The maximum number of steps is determined based on the number of undefined hyperparameters times the maximum optimization rounds per hyperparameter. By default, this value is calculated during hyperparameter optimization.
-
Advanced configuration option. Defines the maximum number of decision trees in the forest. The maximum value is 2000. By default, this value is calculated during hyperparameter optimization.
-
Advanced configuration option. Specifies the maximum number of feature importance values per document to return. By default, no feature importance calculation occurs.
-
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
Defines the seed for the random generator that is used to pick training data. By default, it is randomly generated. Set it to a specific value to use the same training data each time you start a job (assuming other related parameters such as
sourceandanalyzed_fieldsare the same). -
Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This soft limit combines with the
soft_tree_depth_toleranceto penalize trees that exceed the specified depth; the regularized loss increases quickly beyond this depth. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0. -
Advanced configuration option. This option controls how quickly the regularized loss increases when the tree depth exceeds
soft_tree_depth_limit. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.01. -
Defines the number of categories for which the predicted probabilities are reported. It must be non-negative or -1. If it is -1 or greater than the total number of categories, probabilities are reported for all categories; if you have a large number of categories, there could be a significant effect on the size of your destination index. NOTE: To use the AUC ROC evaluation method,
num_top_classesmust be set to -1 or a value greater than or equal to the total number of categories.
-
-
Hide outlier_detection attributes Show outlier_detection attributes object
-
Specifies whether the feature influence calculation is enabled.
-
The minimum outlier score that a document needs to have in order to calculate its feature influence score. Value range: 0-1.
-
The method that outlier detection uses. Available methods are
lof,ldof,distance_kth_nn,distance_knn, andensemble. The default value is ensemble, which means that outlier detection uses an ensemble of different methods and normalises and combines their individual outlier scores to obtain the overall outlier score. -
Defines the value for how many nearest neighbors each method of outlier detection uses to calculate its outlier score. When the value is not set, different values are used for different ensemble members. This default behavior helps improve the diversity in the ensemble; only override it if you are confident that the value you choose is appropriate for the data set.
-
The proportion of the data set that is assumed to be outlying prior to outlier detection. For example, 0.05 means it is assumed that 5% of values are real outliers and 95% are inliers.
-
If true, the following operation is performed on the columns before computing outlier scores:
(x_i - mean(x_i)) / sd(x_i).
-
-
Hide regression attributes Show regression attributes object
-
Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This parameter affects loss calculations by acting as a multiplier of the tree depth. Higher alpha values result in shallower trees and faster training times. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to zero.
-
Defines which field of the document is to be predicted. It must match one of the fields in the index being used to train. If this field is missing from a document, then that document will not be used for training, but a prediction with the trained model will be generated for it. It is also known as continuous target variable. For classification analysis, the data type of the field must be numeric (
integer,short,long,byte), categorical (iporkeyword), orboolean. There must be no more than 30 different values in this field. For regression analysis, the data type of the field must be numeric. -
Advanced configuration option. Controls the fraction of data that is used to compute the derivatives of the loss function for tree training. A small value results in the use of a small fraction of the data. If this value is set to be less than 1, accuracy typically improves. However, too small a value may result in poor convergence for the ensemble and so require more trees. By default, this value is calculated during hyperparameter optimization. It must be greater than zero and less than or equal to 1.
-
Advanced configuration option. Specifies whether the training process should finish if it is not finding any better performing models. If disabled, the training process can take significantly longer and the chance of finding a better performing model is unremarkable.
-
Advanced configuration option. The shrinkage applied to the weights. Smaller values result in larger forests which have a better generalization error. However, larger forests cause slower training. By default, this value is calculated during hyperparameter optimization. It must be a value between 0.001 and 1.
-
Advanced configuration option. Specifies the rate at which
etaincreases for each new tree that is added to the forest. For example, a rate of 1.05 increasesetaby 5% for each extra tree. By default, this value is calculated during hyperparameter optimization. It must be between 0.5 and 2. -
Advanced configuration option. Defines the fraction of features that will be used when selecting a random bag for each candidate split. By default, this value is calculated during hyperparameter optimization.
-
Advanced configuration option. A collection of feature preprocessors that modify one or more included fields. The analysis uses the resulting one or more features instead of the original document field. However, these features are ephemeral; they are not stored in the destination index. Multiple
feature_processorsentries can refer to the same document fields. Automatic categorical feature encoding still occurs for the fields that are unprocessed by a custom processor or that have categorical values. Use this property only if you want to override the automatic feature encoding of the specified fields. -
Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies a linear penalty associated with the size of individual trees in the forest. A high gamma value causes training to prefer small trees. A small gamma value results in larger individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.
-
Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies an L2 regularization term which applies to leaf weights of the individual trees in the forest. A high lambda value causes training to favor small leaf weights. This behavior makes the prediction function smoother at the expense of potentially not being able to capture relevant relationships between the features and the dependent variable. A small lambda value results in large individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.
-
Advanced configuration option. A multiplier responsible for determining the maximum number of hyperparameter optimization steps in the Bayesian optimization procedure. The maximum number of steps is determined based on the number of undefined hyperparameters times the maximum optimization rounds per hyperparameter. By default, this value is calculated during hyperparameter optimization.
-
Advanced configuration option. Defines the maximum number of decision trees in the forest. The maximum value is 2000. By default, this value is calculated during hyperparameter optimization.
-
Advanced configuration option. Specifies the maximum number of feature importance values per document to return. By default, no feature importance calculation occurs.
-
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
Defines the seed for the random generator that is used to pick training data. By default, it is randomly generated. Set it to a specific value to use the same training data each time you start a job (assuming other related parameters such as
sourceandanalyzed_fieldsare the same). -
Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This soft limit combines with the
soft_tree_depth_toleranceto penalize trees that exceed the specified depth; the regularized loss increases quickly beyond this depth. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0. -
Advanced configuration option. This option controls how quickly the regularized loss increases when the tree depth exceeds
soft_tree_depth_limit. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.01. -
The loss function used during regression. Available options are
mse(mean squared error),msle(mean squared logarithmic error),huber(Pseudo-Huber loss). -
A positive number that is used as a parameter to the
loss_function.
-
-
-
Hide analyzed_fields attributes Show analyzed_fields attributes object
-
An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.
-
An array of strings that defines the fields that will be included in the analysis.
-
-
POST _ml/data_frame/analytics/_preview
{
"config": {
"source": {
"index": "houses_sold_last_10_yrs"
},
"analysis": {
"regression": {
"dependent_variable": "price"
}
}
}
}
curl \
--request POST 'http://api.example.com/_ml/data_frame/analytics/{id}/_preview' \
--header "Content-Type: application/json" \
--data '"{\n \"config\": {\n \"source\": {\n \"index\": \"houses_sold_last_10_yrs\"\n },\n \"analysis\": {\n \"regression\": {\n \"dependent_variable\": \"price\"\n }\n }\n }\n}"'
{
"config": {
"source": {
"index": "houses_sold_last_10_yrs"
},
"analysis": {
"regression": {
"dependent_variable": "price"
}
}
}
}