Skip to main content

Predicting numerical values with regression

Regression analysis is a supervised machine learning process for estimating the relationships among different fields in your data, then making further predictions on numerical data based on these relationships. For example, you can predict the response time of a web request or the approximate amount of data that the server exchanges with a client based on historical data.

When you perform regression analysis, you must identify a subset of fields that you want to use to create a model for predicting other fields. Feature variables are the fields that are used to create the model. The dependent variable is the field you want to predict.

Regression algorithms

Regression uses an ensemble learning technique that is similar to extreme gradient boosting (XGBoost) which combines decision trees with gradient boosting methodologies. XGBoost trains a sequence of decision trees and every decision tree learns from the mistakes of the forest so far. In each iteration, the trees added to the forest improve the decision quality of the combined decision forest. By default, the regression algorithm optimizes for a loss function called mean-squared error loss.

There are three types of feature variables that you can use with these algorithms: numerical, categorical, or Boolean. Arrays are not supported.

1. Define the problem

Regression can be useful in cases where a continuous quantity needs to be predicted. The values that regression analysis can predict are numerical values. If your use case requires predicting continuous, numerical values, then regression might be the suitable choice for you.