
What Is the Difference Between Classification and Regression?
Classification and regression are two families of machine learning methods. At a high level, classification is used to categorise data, while regression is used to predict numerical values. Classification and regression are the two most important types of supervised learning, which you can learn more about in my article on supervised and unsupervised learning.
In this text, we’ll take a deeper dive into supervised learning and explore how it works in practice.
What Is Supervised Learning?
Supervised learning is based on labelled data, where a model is trained to predict the label for new, unseen data. So how does this apply to classification and regression?
What Is Classification?
Classification deals with data that is labelled with discrete categories. The goal is to use machine learning to learn how to predict the label based solely on the input data.
For example, imagine you have a dataset consisting of images of cats and dogs. Classification involves defining “cat” and “dog” as discrete categories and training a machine learning model to categorise images as one or the other.
What Is Regression?
Regression focuses on using machine learning to learn a mathematical relationship between different values. Unlike classification, which relies on fixed categories, regression works with values on a continuous scale—such as length or temperature. A regression model predicts one value based on others. Linear regression and polynomial regression are well-known examples, but there are also other types of models, such as decision trees.
A simple example could be data on sold homes, including their living area and sale price. Regression can be used to identify a mathematical relationship between size and price, which can then be used to predict the price of homes that have not yet been sold.
How Do They Compare?
In summary, the differences can be described as follows:
Classification
Data: Divided into predefined, discrete categories
Goal: Assign an input to one of the categories
Model output: A specific category from a set
Regression
Data: Continuous numerical data with an underlying relationship
Goal: Predict one variable based on others
Model output: A numerical value

