Skip to main content
All CollectionsFAQModeling FAQGeneral Modeling FAQ
What is the estimated sample size?
What is the estimated sample size?
Updated over 10 months ago

The estimated sample size feature in the Select Variables step is particularly useful with sparse datasets. Real life datasets often have many variables and missing data. Machine learning models in many cases need complete datasets without missing data or nulls. Business analysts often run into this conundrum: they select a broad set of independent variables and because of the lack of overlap between the data available for the different variables they end up with a very small sample size.

A common strategy to deal with this problem is to impute missing values, i.e. fill in the blanks, using zeros, median values, or simply 'Unknown'. You can select a specific in-fill strategy for each variable using the Missing Values field in the variables list. Clicking on the cycle icon next to the Estimated sample size will refresh the estimated sample size of the dataset that will be processed by the analytics engine. You can select different in-fill settings while monitoring their impact on sample size.


Did this answer your question?