Skip to main content
What is outlier removal?

This article discusses how the G2M platform removes outliers then the outlier removal option is selected

Updated over a week ago

Real-world datasets often include outliers, i.e. data points that are far outside the distribution of the dataset under consideration. In most cases these outliers should be included in the analysis as they more often than not reflect the real outcomes you are trying to analyze and predict.

However, in the case of clustering, outliers can throw off the clustering solution by becoming a cluster or series of clusters themselves. In this case, you may find it useful to exclude outliers prior to clustering so your resulting clusters are not cluttered by what you know to be noise. When the outlier removal setting is turned on, the G2M platform will remove the 5% of the dataset deemed most "outside" the norm using kNN (proximity-based) anomaly detection.

Did this answer your question?