This functionality is experimental and may be changed or removed completely in a future release. Elastic will take a best effort approach to fix any issues, but experimental features are not subject to the support SLA of official GA features.
The following limitations and known problems apply to the 7.5.0 release of the Elastic data frame analytics feature:
Cross-cluster search is not supportededit
Cross-cluster search is not supported for data frame analytics.
Deleting a data frame analytics job does not delete the destination indexedit
The delete data frame analytics job API does not delete the destination index that contains the annotated data of the data frame analytics. That index must be deleted separately.
Data frame analytics jobs cannot be updatededit
You cannot update Data frame analytics configurations. Instead, delete the data frame analytics job and create a new one.
Data frame memory limitationedit
Data frame analytics can analyze data frames that fit into the memory limit dedicated for machine learning processes. For general machine learning settings, see Machine learning settings in Elasticsearch.
Data frame analytics jobs runtime may varyedit
The runtime of the data frame analytics jobs depends on numerous factors, such as the number of data points in the dataset, the type of analytics, the number of fields that are included in the analysis, the supplied hyperparameters, the type of analyzed fields and so on. For example, running an analysis on a dataset with many numerical fields will take longer than running an analysis on a dataset that contains mainly categorical fields. Hyperparameters specified by the user also lower the runtime. For this reason, a general runtime value that applies to all or most of the situations does not exist. The runtime of a data frame analytics job may take from a couple of minutes up to 35 hours in extreme cases.
The runtime increases with the increasing number of analyzed fields in a nearly linear fashion. For datasets of more than 100 000 points, we recommend to start with a low training percent and run a few data frame analytics jobs to see how the runtime scales with the increased number of data points and how the quality of results scales with increased training percentage.
Documents with missing values in analyzed fields are skippededit
If there are missing values in feature fields (fields that are subjects of the data frame analytics), then the document that contains the fields with the missing values will be skipped during the analysis.
Outlier detection field typesedit
Outlier detection requires numeric or boolean data to analyze. The algorithms don’t support missing values (see also Documents with missing values in analyzed fields are skipped), therefore fields that have data types other than numeric or boolean are ignored. Documents where included fields contain missing values, null values, or an array are also ignored. Therefore a destination index may contain documents that don’t have an outlier score. These documents are still reindexed from the source index to the destination index, but they are not included in the outlier detection analysis and therefore no outlier score is computed.
Regression field typesedit
Regression supports fields that are numeric, boolean, text, keyword and ip. It is also tolerant of missing values. Fields that are supported are included in the analysis, other fields are ignored. Documents where included fields contain an array are also ignored. Documents in the destination index that don’t contain a results field are not included in the regression analysis.
Classification field typesedit
Classification supports fields that have numeric, boolean, text, keyword, or ip data types. It is also tolerant of missing values. Fields that are supported are included in the analysis, other fields are ignored. Documents where included fields contain an array are also ignored. Documents in the destination index that don’t contain a results field are not included in the classification analysis.