Supervised Learning: Linear Regression

6 min readJul 7, 2020

This will be a series of blogs that will go into depth about each of the different types of supervised learning available, some of the math and core behind it and finally the application of each of the Supervised learnings in Python!

So firstly what is supervised learning?

Wikipedia defines it as :

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labelled training data consisting of a set of training examples

In other words, the training set(the data you are going to apply your machine learning models too) you feed to your machine learning algorithm which contains the desired solution within it. if you are trying to predict the price of houses based on square footage, location, number of bedrooms or any other multitude of variables that can affect the price of a house, you have the desired solution(the price of the house) in the training set.

Supervised learning is split into two branches:

Regression
Classification

For this Blog, we will be discussing Regression, and in future blogs, we will go over Classification.

Supervised learning: Regression

Firstly Regression problem deals with continuous output variables as opposed to Classification which is discrete data https://www.mathsisfun.com/data/data-discrete-continuous.html

Regression can be simple or Multiple which just means that you are using 1 vs many variables to make a prediction

Both types of regression work with Linear or Non-linear. Linear regression is telling us a straight line can best describe the relationship between the dependant and independent variable(s). and Non-linear regression is telling us the relationship can best explain using a polynomial method between the independent and dependent variable(s)

Linear Regression

Linear Regression is trying to map out the most optimum fit between our dependent variable(y-variable, which is what we are trying to predict) and our independent variable(x-variable, also known as a predictor because it has an effect on what we are trying to predict), an example of this type of Regression is trying to predict the weight or height of a person.

This is the equation for simple linear regression. we are trying to predict data point y(our dependent variable) based on m (the slope of our line) multiplied by corresponding data point x(our independent variable) plus c, which is where our line passes on the y-axis, known as our y-intercept.

When we are dealing with Multiple Linear Regression we are just repeating the MX+C for each variable

Now if you research online you will usually see this picture for the Linear Regression Formula, don’t be too overwhelmed, it is quite simple to explain it in relation to the simpler formula above.

Non-linear regression

Is when a straight line does not fit the relationship between our dependent and independent variables, in this case, we will have to use polynomial methods to best describe the relationship. A polynomial term: a quadratic (squared) or cubic (cubed) term turns a linear regression model into a curve.

We will see a good example of this in the Housing Dataset which shows that age of a house is not predicted best by a linear regression model but rather a polynomial represents it best as older houses tend to be sold at a high price due to it being seen as historic or Antic but as it gets newer it tends to decrease in price and then again shoots up in price as they get more modern.

With both simple linear regression and Non-linear regression, you can also perform Multiple Linear and Multiple Non-linear regression, which is the same, except you are taking into account more independent variables, which is saying that the set of independent variables instead of one will better explain how my dependant variable moves and will enable me to make better predictions about my dependent variable.

What a Linear Regression model seeks to do is to Minimize the Error between the predicted values and the ground truth value which we see in our Training Set. to do this we use Ordinarily Least Square, which seeks to minimize the Sum of Squared Residuals(SSR).

Before I go forward I want to explain what a Cost Function is. the Cost function is a metric which is used to evaluate the performance of our model, it measures the error for a single training example, while Loss function is the average of all the training examples Cost Function

Gradient Descent

Now you may be wondering how do we minimize the Loss Function? this is done through an iterative process called Gradient descent. the way it does this is by taking small steps in a direction using a learning rate(which can be thought of as how much we need to move by)and will keep doing so as long as the value keeps decreasing. to reach this minimum gradient descent needs to know which direction to go in and how big of a step to take to reach those minima. there is a lot more maths involved and this is just a basic overview of what Gradient Descent is

R2

Finally, we use R2 to evaluate how good or bad our model is.

this is the formula for R2

Explaining R2 is best done Visually, this is an example of a Dataset that predicts price based on different variables, in this case, we are trying to predict the SalePrice of the House based on the GrLiveArea(Above grade (ground) living area square feet)

the graph on the left is showing us the Squared Error which is the error between our model(the line) and the actual SalePrice and squares those values

the Right Graph is showing us the difference between the Mean value of SalePrice and the actual SalePrice Value.

we need both to find the R2 of our model

Now you can see Visually What R2 value means. it is simply 1 minus the Sum Squared Error in relation to our Model, divided by the Sum Squared Error in relation to the mean

the R2 is just telling us how much of the variation in dependant variable Y is explained by the independent variable X in our Model, so say if we get an R2 of say 0.6, we can that 60% of the variation in this model is explained by that Independent Variable.

Regression and Classification | Supervised Machine Learning — GeeksforGeeks

Data scientists use many different kinds of machine learning algorithms to discover patterns in big data that lead to…www.geeksforgeeks.org

Learn Python, Data Science & Machine Learning with expert instruction

Start learning data science and machine learning using python today with hands-on courses, comprehensive books, and…www.dunderdata.com

Understanding the Mathematics behind Gradient Descent.

A simple mathematical intuition behind one of the commonly used optimisation algorithms in Machine Learning.towardsdatascience.com

Discrete and Continuous Data

Data can be Descriptive (like “high” or “fast”) or Numerical (numbers). And Numerical Data can be Discrete or…www.mathsisfun.com

Supervised Learning: Linear Regression

Supervised learning: Regression

Linear Regression

Non-linear regression

Gradient Descent

R2

Written by Khalid Gharib

No responses yet