Skip to main content
What variables should I select?

This article discusses what variables you should select when building a machine learning model in the G2M platform.

Updated over a week ago

When building a model the first step of the Train stage will be to select variables. Depending on the type of model you are building you will need to select the following variables:

  • Independent variables: all models will require at least one or two independent variables to be selected. These will be the inputs on which your model depends to make a prediction. Independent variables can be of any type (numerical, categorical, boolean) but need to have a known type. When selecting independent variables you will notice the G2M platform automatically detects variable type. In some instances the variable type cannot be detected. You then need to set it manually using the dropdown in the variables list before selecting it as an independent variable.

  • Index variable: designating an index variable is not strictly required but is strongly recommended. It is used to identify specific data records throughout the modeling process, from data ingestion to training and prediction. It provides an audit trail that ensure you always know which record you are looking at once model predictions are generated and used for any downstream purpose.

  • Dependent variable: when building a propensity or a regression model you will need to select a dependent variable quantifying the outcome you are trying to predict. For propensity models the dependent variable should be boolean. For regression models the dependent variable should be numerical.

  • Treatment variable: when building an A/B Testing model, you will need to select the treatment variable identifying which group of record was treated ("A"), which group was held as control ("B"). The treatment variable should be boolean.


Did this answer your question?