Preview features used by data frame analytics Generally available; Added in 7.13.0

POST /_ml/data_frame/analytics/{id}/_preview

Preview the extracted features used by a data frame analytics config. ##Required authorization

  • Cluster privileges: monitor_ml

Path parameters

  • id string Required

    Identifier for the data frame analytics job.

application/json

Body

  • config object
    Hide config attributes Show config attributes object
    • source object Required
      Hide source attributes Show source attributes object
      • index string | array[string] Required
      • query object

        An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

        External documentation
        Hide query attributes Show query attributes object
        • bool object
          Hide bool attributes Show bool attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • filter object | array[object]

            The clause (query) must appear in matching documents. However, unlike must, the score of the query will be ignored.

            One of:

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

          • minimum_should_match number | string

            The minimum number of terms that should match as integer, percentage or range

          • must object | array[object]

            The clause (query) must appear in matching documents and will contribute to the score.

            One of:

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

          • must_not object | array[object]

            The clause (query) must not appear in the matching documents. Because scoring is ignored, a score of 0 is returned for all documents.

            One of:

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

          • should object | array[object]

            The clause (query) should appear in the matching document.

            One of:

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

        • boosting object
          Hide boosting attributes Show boosting attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • negative_boost number Required

            Floating point number between 0 and 1.0 used to decrease the relevance scores of documents matching the negative query.

          • negative object Required

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

          • positive object Required

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

        • common object Deprecated
        • combined_fields object
          Hide combined_fields attributes Show combined_fields attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • fields array[string] Required

            List of fields to search. Field wildcard patterns are allowed. Only text fields are supported, and they must all have the same search analyzer.

          • query string Required

            Text to search for in the provided fields. The combined_fields query analyzes the provided text before performing a search.

          • auto_generate_synonyms_phrase_query boolean

            If true, match phrase queries are automatically created for multi-term synonyms.

          • operator string

            Values are or or and.

          • minimum_should_match number | string

            The minimum number of terms that should match as integer, percentage or range

          • zero_terms_query string

            Values are none or all.

        • constant_score object
          Hide constant_score attributes Show constant_score attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • filter object Required

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

        • dis_max object
          Hide dis_max attributes Show dis_max attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • queries array[object] Required

            One or more query clauses. Returned documents must match one or more of these queries. If a document matches multiple queries, Elasticsearch uses the highest relevance score.

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

          • tie_breaker number

            Floating point number between 0 and 1.0 used to increase the relevance scores of documents matching multiple query clauses.

        • exists object
          Hide exists attributes Show exists attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

        • function_score object
          Hide function_score attributes Show function_score attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • boost_mode string

            Values are multiply, replace, sum, avg, max, or min.

          • functions array[object]

            One or more functions that compute a new score for each document returned by the query.

          • max_boost number

            Restricts the new score to not exceed the provided limit.

          • min_score number

            Excludes documents that do not meet the provided score threshold.

          • query object

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

          • score_mode string

            Values are multiply, sum, avg, first, max, or min.

        • fuzzy object

          Returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.

          External documentation
        • geo_bounding_box object
          Hide geo_bounding_box attributes Show geo_bounding_box attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • type string

            Values are memory or indexed.

          • validation_method string

            Values are coerce, ignore_malformed, or strict.

          • ignore_unmapped boolean

            Set to true to ignore an unmapped field and not match any documents for this query. Set to false to throw an exception if the field is not mapped.

        • geo_distance object
          Hide geo_distance attributes Show geo_distance attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • distance string Required
          • distance_type string

            Values are arc or plane.

          • validation_method string

            Values are coerce, ignore_malformed, or strict.

          • ignore_unmapped boolean

            Set to true to ignore an unmapped field and not match any documents for this query. Set to false to throw an exception if the field is not mapped.

        • geo_grid object

          Matches geo_point and geo_shape values that intersect a grid cell from a GeoGrid aggregation.

        • geo_polygon object
          Hide geo_polygon attributes Show geo_polygon attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • validation_method string

            Values are coerce, ignore_malformed, or strict.

          • ignore_unmapped boolean
        • geo_shape object
          Hide geo_shape attributes Show geo_shape attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • ignore_unmapped boolean

            Set to true to ignore an unmapped field and not match any documents for this query. Set to false to throw an exception if the field is not mapped.

        • has_child object
          Hide has_child attributes Show has_child attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • ignore_unmapped boolean

            Indicates whether to ignore an unmapped type and not return any documents instead of an error.

          • inner_hits object
            Hide inner_hits attributes Show inner_hits attributes object
            • name string
            • size number

              The maximum number of hits to return per inner_hits.

            • from number

              Inner hit starting document offset.

            • collapse object
            • docvalue_fields array[object]
            • explain boolean
            • ignore_unmapped boolean
            • script_fields object
            • seq_no_primary_term boolean
            • fields array[string]

              Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

            • sort
            • _source
            • stored_fields string | array[string]
            • track_scores boolean
            • version boolean
          • max_children number

            Maximum number of child documents that match the query allowed for a returned parent document. If the parent document exceeds this limit, it is excluded from the search results.

          • min_children number

            Minimum number of child documents that match the query required to match the query for a returned parent document. If the parent document does not meet this limit, it is excluded from the search results.

          • query object Required

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

          • score_mode string

            Values are none, avg, sum, max, or min.

          • type string Required
        • has_parent object
          Hide has_parent attributes Show has_parent attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • ignore_unmapped boolean

            Indicates whether to ignore an unmapped parent_type and not return any documents instead of an error. You can use this parameter to query multiple indices that may not contain the parent_type.

          • inner_hits object
            Hide inner_hits attributes Show inner_hits attributes object
            • name string
            • size number

              The maximum number of hits to return per inner_hits.

            • from number

              Inner hit starting document offset.

            • collapse object
            • docvalue_fields array[object]
            • explain boolean
            • ignore_unmapped boolean
            • script_fields object
            • seq_no_primary_term boolean
            • fields array[string]

              Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

            • sort
            • _source
            • stored_fields string | array[string]
            • track_scores boolean
            • version boolean
          • parent_type string Required
          • query object Required

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

          • score boolean

            Indicates whether the relevance score of a matching parent document is aggregated into its child documents.

        • ids object
          Hide ids attributes Show ids attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • values string | array[string]

        • intervals object

          Returns documents based on the order and proximity of matching terms.

          External documentation
        • knn object
          Hide knn attributes Show knn attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • query_vector array[number]
          • query_vector_builder object
            Hide query_vector_builder attribute Show query_vector_builder attribute object
            • text_embedding object
          • num_candidates number

            The number of nearest neighbor candidates to consider per shard

          • k number

            The final number of nearest neighbors to return as top hits

          • filter object | array[object]

            Filters for the kNN search query

            One of:

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

          • similarity number

            The minimum similarity for a vector to be considered a match

          • rescore_vector object
            Hide rescore_vector attribute Show rescore_vector attribute object
            • oversample number Required

              Applies the specified oversample factor to k on the approximate kNN search

        • match object

          Returns documents that match a provided text, number, date or boolean value. The provided text is analyzed before matching.

          External documentation
        • match_all object
          Hide match_all attributes Show match_all attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
        • match_bool_prefix object

          Analyzes its input and constructs a bool query from the terms. Each term except the last is used in a term query. The last term is used in a prefix query.

          External documentation
        • match_none object
          Hide match_none attributes Show match_none attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
        • match_phrase object

          Analyzes the text and creates a phrase query out of the analyzed text.

          External documentation
        • match_phrase_prefix object

          Returns documents that contain the words of a provided text, in the same order as provided. The last term of the provided text is treated as a prefix, matching any words that begin with that term.

          External documentation
        • more_like_this object
          Hide more_like_this attributes Show more_like_this attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • analyzer string

            The analyzer that is used to analyze the free form text. Defaults to the analyzer associated with the first field in fields.

            External documentation
          • boost_terms number

            Each term in the formed query could be further boosted by their tf-idf score. This sets the boost factor to use when using this feature. Defaults to deactivated (0).

          • fail_on_unsupported_field boolean

            Controls whether the query should fail (throw an exception) if any of the specified fields are not of the supported types (text or keyword).

          • fields array[string]

            A list of fields to fetch and analyze the text from. Defaults to the index.query.default_field index setting, which has a default value of *.

          • include boolean

            Specifies whether the input documents should also be included in the search results returned.

          • like array[string | object] Required
          • max_doc_freq number

            The maximum document frequency above which the terms are ignored from the input document.

          • max_query_terms number

            The maximum number of query terms that can be selected.

          • max_word_length number

            The maximum word length above which the terms are ignored. Defaults to unbounded (0).

          • min_doc_freq number

            The minimum document frequency below which the terms are ignored from the input document.

          • minimum_should_match number | string

            The minimum number of terms that should match as integer, percentage or range

          • min_term_freq number

            The minimum term frequency below which the terms are ignored from the input document.

          • min_word_length number

            The minimum word length below which the terms are ignored.

          • routing string
          • stop_words string | array[string]

            Language value, such as arabic or thai. Defaults to english. Each language value corresponds to a predefined list of stop words in Lucene. See Stop words by language for supported language values and their stop words. Also accepts an array of stop words.

            One of:

            Values are _arabic_, _armenian_, _basque_, _bengali_, _brazilian_, _bulgarian_, _catalan_, _cjk_, _czech_, _danish_, _dutch_, _english_, _estonian_, _finnish_, _french_, _galician_, _german_, _greek_, _hindi_, _hungarian_, _indonesian_, _irish_, _italian_, _latvian_, _lithuanian_, _norwegian_, _persian_, _portuguese_, _romanian_, _russian_, _serbian_, _sorani_, _spanish_, _swedish_, _thai_, _turkish_, or _none_.

          • unlike array[string | object]
          • version number
          • version_type string

            Values are internal, external, external_gte, or force.

        • multi_match object
          Hide multi_match attributes Show multi_match attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • analyzer string

            Analyzer used to convert the text in the query value into tokens.

          • auto_generate_synonyms_phrase_query boolean

            If true, match phrase queries are automatically created for multi-term synonyms.

          • cutoff_frequency number Deprecated
          • fields string | array[string]
          • fuzziness string | number

          • fuzzy_rewrite string
          • fuzzy_transpositions boolean

            If true, edits for fuzzy matching include transpositions of two adjacent characters (for example, ab to ba). Can be applied to the term subqueries constructed for all terms but the final term.

          • lenient boolean

            If true, format-based errors, such as providing a text query value for a numeric field, are ignored.

          • max_expansions number

            Maximum number of terms to which the query will expand.

          • minimum_should_match number | string

            The minimum number of terms that should match as integer, percentage or range

          • operator string

            Values are and, AND, or, or OR.

          • prefix_length number

            Number of beginning characters left unchanged for fuzzy matching.

          • query string Required

            Text, number, boolean value or date you wish to find in the provided field.

          • slop number

            Maximum number of positions allowed between matching tokens.

          • tie_breaker number

            Determines how scores for each per-term blended query and scores across groups are combined.

          • type string

            Values are best_fields, most_fields, cross_fields, phrase, phrase_prefix, or bool_prefix.

          • zero_terms_query string

            Values are all or none.

        • nested object
          Hide nested attributes Show nested attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • ignore_unmapped boolean

            Indicates whether to ignore an unmapped path and not return any documents instead of an error.

          • inner_hits object
            Hide inner_hits attributes Show inner_hits attributes object
            • name string
            • size number

              The maximum number of hits to return per inner_hits.

            • from number

              Inner hit starting document offset.

            • collapse object
            • docvalue_fields array[object]
            • explain boolean
            • ignore_unmapped boolean
            • script_fields object
            • seq_no_primary_term boolean
            • fields array[string]

              Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

            • sort
            • _source
            • stored_fields string | array[string]
            • track_scores boolean
            • version boolean
          • path string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • query object Required

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

          • score_mode string

            Values are none, avg, sum, max, or min.

        • parent_id object
          Hide parent_id attributes Show parent_id attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • id string
          • ignore_unmapped boolean

            Indicates whether to ignore an unmapped type and not return any documents instead of an error.

          • type string
        • percolate object
          Hide percolate attributes Show percolate attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • document object

            The source of the document being percolated.

          • documents array[object]

            An array of sources of the documents being percolated.

          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • id string
          • index string
          • name string

            The suffix used for the _percolator_document_slot field when multiple percolate queries are specified.

          • preference string

            Preference used to fetch document to percolate.

          • routing string
          • version number
        • pinned object
          Hide pinned attributes Show pinned attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • organic object Required

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

          • ids array[string]

            Document IDs listed in the order they are to appear in results. Required if docs is not specified.

          • docs array[object]

            Documents listed in the order they are to appear in results. Required if ids is not specified.

        • prefix object

          Returns documents that contain a specific prefix in a provided field.

          External documentation
        • query_string object
          Hide query_string attributes Show query_string attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • allow_leading_wildcard boolean

            If true, the wildcard characters * and ? are allowed as the first character of the query string.

          • analyzer string

            Analyzer used to convert text in the query string into tokens.

          • analyze_wildcard boolean

            If true, the query attempts to analyze wildcard terms in the query string.

          • auto_generate_synonyms_phrase_query boolean

            If true, match phrase queries are automatically created for multi-term synonyms.

          • default_field string

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • default_operator string

            Values are and, AND, or, or OR.

          • enable_position_increments boolean

            If true, enable position increments in queries constructed from a query_string search.

          • escape boolean
          • fields array[string]

            Array of fields to search. Supports wildcards (*).

          • fuzziness string | number

          • fuzzy_max_expansions number

            Maximum number of terms to which the query expands for fuzzy matching.

          • fuzzy_prefix_length number

            Number of beginning characters left unchanged for fuzzy matching.

          • fuzzy_rewrite string
          • fuzzy_transpositions boolean

            If true, edits for fuzzy matching include transpositions of two adjacent characters (for example, ab to ba).

          • lenient boolean

            If true, format-based errors, such as providing a text value for a numeric field, are ignored.

          • max_determinized_states number

            Maximum number of automaton states required for the query.

          • minimum_should_match number | string

            The minimum number of terms that should match as integer, percentage or range

          • phrase_slop number

            Maximum number of positions allowed between matching tokens for phrases.

          • query string Required

            Query string you wish to parse and use for search.

          • quote_analyzer string

            Analyzer used to convert quoted text in the query string into tokens. For quoted text, this parameter overrides the analyzer specified in the analyzer parameter.

          • quote_field_suffix string

            Suffix appended to quoted text in the query string. You can use this suffix to use a different analysis method for exact matches.

          • rewrite string
          • tie_breaker number

            How to combine the queries generated from the individual search terms in the resulting dis_max query.

          • time_zone string
          • type string

            Values are best_fields, most_fields, cross_fields, phrase, phrase_prefix, or bool_prefix.

        • range object

          Returns documents that contain terms within a provided range.

          External documentation
        • rank_feature object
          Hide rank_feature attributes Show rank_feature attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • saturation object
          • log object
          • linear object
          • sigmoid object
        • regexp object

          Returns documents that contain terms matching a regular expression.

          External documentation
        • rule object
          Hide rule attributes Show rule attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • organic object Required

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

          • ruleset_ids string | array[string]

          • ruleset_id string
          • match_criteria object Required
        • script object
          Hide script attributes Show script attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • script object Required
            Hide script attributes Show script attributes object
            • source
            • id string
            • params object

              Specifies any named parameters that are passed into the script as variables. Use parameters instead of hard-coded values to decrease compile time.

            • lang
            • options object
        • script_score object
          Hide script_score attributes Show script_score attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • min_score number

            Documents with a score lower than this floating point number are excluded from the search results.

          • query object Required

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

          • script object Required
            Hide script attributes Show script attributes object
            • source
            • id string
            • params object

              Specifies any named parameters that are passed into the script as variables. Use parameters instead of hard-coded values to decrease compile time.

            • lang
            • options object
        • semantic object
          Hide semantic attributes Show semantic attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • field string Required

            The field to query, which must be a semantic_text field type

          • query string Required

            The query text

        • shape object
          Hide shape attributes Show shape attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • ignore_unmapped boolean

            When set to true the query ignores an unmapped field and will not match any documents.

        • simple_query_string object
          Hide simple_query_string attributes Show simple_query_string attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • analyzer string

            Analyzer used to convert text in the query string into tokens.

          • analyze_wildcard boolean

            If true, the query attempts to analyze wildcard terms in the query string.

          • auto_generate_synonyms_phrase_query boolean

            If true, the parser creates a match_phrase query for each multi-position token.

          • default_operator string

            Values are and, AND, or, or OR.

          • fields array[string]

            Array of fields you wish to search. Accepts wildcard expressions. You also can boost relevance scores for matches to particular fields using a caret (^) notation. Defaults to the index.query.default_field index setting, which has a default value of *.

          • flags
          • fuzzy_max_expansions number

            Maximum number of terms to which the query expands for fuzzy matching.

          • fuzzy_prefix_length number

            Number of beginning characters left unchanged for fuzzy matching.

          • fuzzy_transpositions boolean

            If true, edits for fuzzy matching include transpositions of two adjacent characters (for example, ab to ba).

          • lenient boolean

            If true, format-based errors, such as providing a text value for a numeric field, are ignored.

          • minimum_should_match number | string

            The minimum number of terms that should match as integer, percentage or range

          • query string Required

            Query string in the simple query string syntax you wish to parse and use for search.

          • quote_field_suffix string

            Suffix appended to quoted text in the query string.

        • span_containing object
          Hide span_containing attributes Show span_containing attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • big object Required
            Hide big attributes Show big attributes object
            • span_gap object

              Can only be used as a clause in a span_near query.

            • span_term object

              The equivalent of the term query but for use with other span queries.

          • little object Required
            Hide little attributes Show little attributes object
            • span_gap object

              Can only be used as a clause in a span_near query.

            • span_term object

              The equivalent of the term query but for use with other span queries.

        • span_field_masking object
          Hide span_field_masking attributes Show span_field_masking attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • query object Required
            Hide query attributes Show query attributes object
            • span_gap object

              Can only be used as a clause in a span_near query.

            • span_term object

              The equivalent of the term query but for use with other span queries.

        • span_first object
          Hide span_first attributes Show span_first attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • end number Required

            Controls the maximum end position permitted in a match.

          • match object Required
            Hide match attributes Show match attributes object
            • span_gap object

              Can only be used as a clause in a span_near query.

            • span_term object

              The equivalent of the term query but for use with other span queries.

        • span_multi object
          Hide span_multi attributes Show span_multi attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • match object Required

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

        • span_near object
          Hide span_near attributes Show span_near attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • clauses array[object] Required

            Array of one or more other span type queries.

          • in_order boolean

            Controls whether matches are required to be in-order.

          • slop number

            Controls the maximum number of intervening unmatched positions permitted.

        • span_not object
          Hide span_not attributes Show span_not attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • dist number

            The number of tokens from within the include span that can’t have overlap with the exclude span. Equivalent to setting both pre and post.

          • exclude object Required
            Hide exclude attributes Show exclude attributes object
            • span_gap object

              Can only be used as a clause in a span_near query.

            • span_term object

              The equivalent of the term query but for use with other span queries.

          • include object Required
            Hide include attributes Show include attributes object
            • span_gap object

              Can only be used as a clause in a span_near query.

            • span_term object

              The equivalent of the term query but for use with other span queries.

          • post number

            The number of tokens after the include span that can’t have overlap with the exclude span.

          • pre number

            The number of tokens before the include span that can’t have overlap with the exclude span.

        • span_or object
          Hide span_or attributes Show span_or attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • clauses array[object] Required

            Array of one or more other span type queries.

        • span_term object

          Matches spans containing a term.

          External documentation
        • span_within object
          Hide span_within attributes Show span_within attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • big object Required
            Hide big attributes Show big attributes object
            • span_gap object

              Can only be used as a clause in a span_near query.

            • span_term object

              The equivalent of the term query but for use with other span queries.

          • little object Required
            Hide little attributes Show little attributes object
            • span_gap object

              Can only be used as a clause in a span_near query.

            • span_term object

              The equivalent of the term query but for use with other span queries.

        • sparse_vector object
          Hide sparse_vector attributes Show sparse_vector attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • query string

            The query text you want to use for search. If inference_id is specified, query must also be specified.

          • prune boolean Technical preview; Added in 8.15.0

            Whether to perform pruning, omitting the non-significant tokens from the query to improve query performance. If prune is true but the pruning_config is not specified, pruning will occur but default values will be used. Default: false

          • pruning_config object
          • query_vector object

            Dictionary of precomputed sparse vectors and their associated weights. Only one of inference_id or query_vector may be supplied in a request.

          • inference_id string
        • term object

          Returns documents that contain an exact term in a provided field. To return a document, the query term must exactly match the queried field's value, including whitespace and capitalization.

          External documentation
        • terms object
          Hide terms attributes Show terms attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
        • terms_set object

          Returns documents that contain a minimum number of exact terms in a provided field. To return a document, a required number of terms must exactly match the field values, including whitespace and capitalization.

          External documentation
        • text_expansion object Deprecated Generally available; Added in 8.8.0

          Uses a natural language processing model to convert the query text into a list of token-weight pairs which are then used in a query against a sparse vector or rank features field.

          External documentation
        • weighted_tokens object Deprecated Generally available; Added in 8.13.0

          Supports returning text_expansion query results by sending in precomputed tokens with the query.

          External documentation
        • wildcard object

          Returns documents that contain terms matching a wildcard pattern.

          External documentation
        • wrapper object
          Hide wrapper attributes Show wrapper attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • query string Required

            A base64 encoded query. The binary data format can be any of JSON, YAML, CBOR or SMILE encodings

        • type object
          Hide type attributes Show type attributes object
          • boost number

            Floating point number used to decrease or increase the relevance scores of the query. Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

          • _name string
          • value string Required
      • runtime_mappings object
        Hide runtime_mappings attribute Show runtime_mappings attribute object
        • * object Additional properties
          Hide * attributes Show * attributes object
          • fields object

            For type composite

            Hide fields attribute Show fields attribute object
            • * object Additional properties
              Hide * attribute Show * attribute object
              • type string Required

                Values are boolean, composite, date, double, geo_point, geo_shape, ip, keyword, long, or lookup.

          • fetch_fields array[object]

            For type lookup

            Hide fetch_fields attributes Show fetch_fields attributes object
            • field string Required

              Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

            • format string
          • format string

            A custom format for date type runtime fields.

          • input_field string

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • target_field string

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • target_index string
          • script object
            Hide script attributes Show script attributes object
            • id string
            • params object

              Specifies any named parameters that are passed into the script as variables. Use parameters instead of hard-coded values to decrease compile time.

              Hide params attribute Show params attribute object
              • * object Additional properties
            • lang string

              Any of:

              Values are painless, expression, mustache, or java.

            • options object
              Hide options attribute Show options attribute object
              • * string Additional properties
          • type string Required

            Values are boolean, composite, date, double, geo_point, geo_shape, ip, keyword, long, or lookup.

      • _source object
        Hide _source attributes Show _source attributes object
        • includes array[string]

          An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.

        • excludes array[string]

          An array of strings that defines the fields that will be included in the analysis.

    • analysis object Required
      Hide analysis attributes Show analysis attributes object
      • classification object
        Hide classification attributes Show classification attributes object
        • alpha number

          Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This parameter affects loss calculations by acting as a multiplier of the tree depth. Higher alpha values result in shallower trees and faster training times. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to zero.

        • dependent_variable string Required

          Defines which field of the document is to be predicted. It must match one of the fields in the index being used to train. If this field is missing from a document, then that document will not be used for training, but a prediction with the trained model will be generated for it. It is also known as continuous target variable. For classification analysis, the data type of the field must be numeric (integer, short, long, byte), categorical (ip or keyword), or boolean. There must be no more than 30 different values in this field. For regression analysis, the data type of the field must be numeric.

        • downsample_factor number

          Advanced configuration option. Controls the fraction of data that is used to compute the derivatives of the loss function for tree training. A small value results in the use of a small fraction of the data. If this value is set to be less than 1, accuracy typically improves. However, too small a value may result in poor convergence for the ensemble and so require more trees. By default, this value is calculated during hyperparameter optimization. It must be greater than zero and less than or equal to 1.

        • early_stopping_enabled boolean

          Advanced configuration option. Specifies whether the training process should finish if it is not finding any better performing models. If disabled, the training process can take significantly longer and the chance of finding a better performing model is unremarkable.

        • eta number

          Advanced configuration option. The shrinkage applied to the weights. Smaller values result in larger forests which have a better generalization error. However, larger forests cause slower training. By default, this value is calculated during hyperparameter optimization. It must be a value between 0.001 and 1.

        • eta_growth_rate_per_tree number

          Advanced configuration option. Specifies the rate at which eta increases for each new tree that is added to the forest. For example, a rate of 1.05 increases eta by 5% for each extra tree. By default, this value is calculated during hyperparameter optimization. It must be between 0.5 and 2.

        • feature_bag_fraction number

          Advanced configuration option. Defines the fraction of features that will be used when selecting a random bag for each candidate split. By default, this value is calculated during hyperparameter optimization.

        • feature_processors array[object]

          Advanced configuration option. A collection of feature preprocessors that modify one or more included fields. The analysis uses the resulting one or more features instead of the original document field. However, these features are ephemeral; they are not stored in the destination index. Multiple feature_processors entries can refer to the same document fields. Automatic categorical feature encoding still occurs for the fields that are unprocessed by a custom processor or that have categorical values. Use this property only if you want to override the automatic feature encoding of the specified fields.

          Hide feature_processors attributes Show feature_processors attributes object
          • frequency_encoding object
          • multi_encoding object
          • n_gram_encoding object
          • one_hot_encoding object
          • target_mean_encoding object
        • gamma number

          Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies a linear penalty associated with the size of individual trees in the forest. A high gamma value causes training to prefer small trees. A small gamma value results in larger individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

        • lambda number

          Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies an L2 regularization term which applies to leaf weights of the individual trees in the forest. A high lambda value causes training to favor small leaf weights. This behavior makes the prediction function smoother at the expense of potentially not being able to capture relevant relationships between the features and the dependent variable. A small lambda value results in large individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

        • max_optimization_rounds_per_hyperparameter number

          Advanced configuration option. A multiplier responsible for determining the maximum number of hyperparameter optimization steps in the Bayesian optimization procedure. The maximum number of steps is determined based on the number of undefined hyperparameters times the maximum optimization rounds per hyperparameter. By default, this value is calculated during hyperparameter optimization.

        • max_trees number

          Advanced configuration option. Defines the maximum number of decision trees in the forest. The maximum value is 2000. By default, this value is calculated during hyperparameter optimization.

        • num_top_feature_importance_values number

          Advanced configuration option. Specifies the maximum number of feature importance values per document to return. By default, no feature importance calculation occurs.

        • prediction_field_name string

          Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

        • randomize_seed number

          Defines the seed for the random generator that is used to pick training data. By default, it is randomly generated. Set it to a specific value to use the same training data each time you start a job (assuming other related parameters such as source and analyzed_fields are the same).

        • soft_tree_depth_limit number

          Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This soft limit combines with the soft_tree_depth_tolerance to penalize trees that exceed the specified depth; the regularized loss increases quickly beyond this depth. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.

        • soft_tree_depth_tolerance number

          Advanced configuration option. This option controls how quickly the regularized loss increases when the tree depth exceeds soft_tree_depth_limit. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.01.

        • training_percent string | number

        • class_assignment_objective string
        • num_top_classes number

          Defines the number of categories for which the predicted probabilities are reported. It must be non-negative or -1. If it is -1 or greater than the total number of categories, probabilities are reported for all categories; if you have a large number of categories, there could be a significant effect on the size of your destination index. NOTE: To use the AUC ROC evaluation method, num_top_classes must be set to -1 or a value greater than or equal to the total number of categories.

      • outlier_detection object
        Hide outlier_detection attributes Show outlier_detection attributes object
        • compute_feature_influence boolean

          Specifies whether the feature influence calculation is enabled.

        • feature_influence_threshold number

          The minimum outlier score that a document needs to have in order to calculate its feature influence score. Value range: 0-1.

        • method string

          The method that outlier detection uses. Available methods are lof, ldof, distance_kth_nn, distance_knn, and ensemble. The default value is ensemble, which means that outlier detection uses an ensemble of different methods and normalises and combines their individual outlier scores to obtain the overall outlier score.

        • n_neighbors number

          Defines the value for how many nearest neighbors each method of outlier detection uses to calculate its outlier score. When the value is not set, different values are used for different ensemble members. This default behavior helps improve the diversity in the ensemble; only override it if you are confident that the value you choose is appropriate for the data set.

        • outlier_fraction number

          The proportion of the data set that is assumed to be outlying prior to outlier detection. For example, 0.05 means it is assumed that 5% of values are real outliers and 95% are inliers.

        • standardization_enabled boolean

          If true, the following operation is performed on the columns before computing outlier scores: (x_i - mean(x_i)) / sd(x_i).

      • regression object
        Hide regression attributes Show regression attributes object
        • alpha number

          Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This parameter affects loss calculations by acting as a multiplier of the tree depth. Higher alpha values result in shallower trees and faster training times. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to zero.

        • dependent_variable string Required

          Defines which field of the document is to be predicted. It must match one of the fields in the index being used to train. If this field is missing from a document, then that document will not be used for training, but a prediction with the trained model will be generated for it. It is also known as continuous target variable. For classification analysis, the data type of the field must be numeric (integer, short, long, byte), categorical (ip or keyword), or boolean. There must be no more than 30 different values in this field. For regression analysis, the data type of the field must be numeric.

        • downsample_factor number

          Advanced configuration option. Controls the fraction of data that is used to compute the derivatives of the loss function for tree training. A small value results in the use of a small fraction of the data. If this value is set to be less than 1, accuracy typically improves. However, too small a value may result in poor convergence for the ensemble and so require more trees. By default, this value is calculated during hyperparameter optimization. It must be greater than zero and less than or equal to 1.

        • early_stopping_enabled boolean

          Advanced configuration option. Specifies whether the training process should finish if it is not finding any better performing models. If disabled, the training process can take significantly longer and the chance of finding a better performing model is unremarkable.

        • eta number

          Advanced configuration option. The shrinkage applied to the weights. Smaller values result in larger forests which have a better generalization error. However, larger forests cause slower training. By default, this value is calculated during hyperparameter optimization. It must be a value between 0.001 and 1.

        • eta_growth_rate_per_tree number

          Advanced configuration option. Specifies the rate at which eta increases for each new tree that is added to the forest. For example, a rate of 1.05 increases eta by 5% for each extra tree. By default, this value is calculated during hyperparameter optimization. It must be between 0.5 and 2.

        • feature_bag_fraction number

          Advanced configuration option. Defines the fraction of features that will be used when selecting a random bag for each candidate split. By default, this value is calculated during hyperparameter optimization.

        • feature_processors array[object]

          Advanced configuration option. A collection of feature preprocessors that modify one or more included fields. The analysis uses the resulting one or more features instead of the original document field. However, these features are ephemeral; they are not stored in the destination index. Multiple feature_processors entries can refer to the same document fields. Automatic categorical feature encoding still occurs for the fields that are unprocessed by a custom processor or that have categorical values. Use this property only if you want to override the automatic feature encoding of the specified fields.

          Hide feature_processors attributes Show feature_processors attributes object
          • frequency_encoding object
          • multi_encoding object
          • n_gram_encoding object
          • one_hot_encoding object
          • target_mean_encoding object
        • gamma number

          Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies a linear penalty associated with the size of individual trees in the forest. A high gamma value causes training to prefer small trees. A small gamma value results in larger individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

        • lambda number

          Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies an L2 regularization term which applies to leaf weights of the individual trees in the forest. A high lambda value causes training to favor small leaf weights. This behavior makes the prediction function smoother at the expense of potentially not being able to capture relevant relationships between the features and the dependent variable. A small lambda value results in large individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

        • max_optimization_rounds_per_hyperparameter number

          Advanced configuration option. A multiplier responsible for determining the maximum number of hyperparameter optimization steps in the Bayesian optimization procedure. The maximum number of steps is determined based on the number of undefined hyperparameters times the maximum optimization rounds per hyperparameter. By default, this value is calculated during hyperparameter optimization.

        • max_trees number

          Advanced configuration option. Defines the maximum number of decision trees in the forest. The maximum value is 2000. By default, this value is calculated during hyperparameter optimization.

        • num_top_feature_importance_values number

          Advanced configuration option. Specifies the maximum number of feature importance values per document to return. By default, no feature importance calculation occurs.

        • prediction_field_name string

          Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

        • randomize_seed number

          Defines the seed for the random generator that is used to pick training data. By default, it is randomly generated. Set it to a specific value to use the same training data each time you start a job (assuming other related parameters such as source and analyzed_fields are the same).

        • soft_tree_depth_limit number

          Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This soft limit combines with the soft_tree_depth_tolerance to penalize trees that exceed the specified depth; the regularized loss increases quickly beyond this depth. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.

        • soft_tree_depth_tolerance number

          Advanced configuration option. This option controls how quickly the regularized loss increases when the tree depth exceeds soft_tree_depth_limit. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.01.

        • training_percent string | number

        • loss_function string

          The loss function used during regression. Available options are mse (mean squared error), msle (mean squared logarithmic error), huber (Pseudo-Huber loss).

        • loss_function_parameter number

          A positive number that is used as a parameter to the loss_function.

    • model_memory_limit string
    • max_num_threads number
    • analyzed_fields object
      Hide analyzed_fields attributes Show analyzed_fields attributes object
      • includes array[string]

        An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.

      • excludes array[string]

        An array of strings that defines the fields that will be included in the analysis.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • feature_values array[object] Required

      An array of objects that contain feature name and value pairs. The features have been processed and indicate what will be sent to the model for training.

      Hide feature_values attribute Show feature_values attribute object
      • * string Additional properties
POST /_ml/data_frame/analytics/{id}/_preview
POST _ml/data_frame/analytics/_preview
{
  "config": {
    "source": {
      "index": "houses_sold_last_10_yrs"
    },
    "analysis": {
      "regression": {
        "dependent_variable": "price"
      }
    }
  }
}
curl \
 --request POST 'http://api.example.com/_ml/data_frame/analytics/{id}/_preview' \
 --header "Content-Type: application/json" \
 --data '"{\n  \"config\": {\n    \"source\": {\n      \"index\": \"houses_sold_last_10_yrs\"\n    },\n    \"analysis\": {\n      \"regression\": {\n        \"dependent_variable\": \"price\"\n      }\n    }\n  }\n}"'
Request example
An example body for a `POST _ml/data_frame/analytics/_preview` request.
{
  "config": {
    "source": {
      "index": "houses_sold_last_10_yrs"
    },
    "analysis": {
      "regression": {
        "dependent_variable": "price"
      }
    }
  }
}

Documentation preview

This is a preview of your version @2025-06-09 which is not yet released.