Linear Regression


Machine Learning / Saturday, February 3rd, 2018

Linear Regression is a supervised learning algorithm used in regression tasks. In this post we’ll see what is regression and understand what linear regression is and how it works.We’ll also implement it with sklearn.

Linear regression is the simplest and most classic technique in these tasks. It is widely used in the cases where there is need to predict numerical values using historical data. It makes prediction using a linear function of the input features.

What is Regression?

In a regression task , the goal is to predict a continuous number ,or a real number. Predicting prices of house from their floor space and age of the house is an example of regression task.Another example is predicting a person’s annual income from their education,their age and where they live.

If there is continuity between possible outcomes ,then the problem is regression problem.

Linear Regression

We’ve a bunch of algorithms to solve a regression tasks like Ridge Regression , Lasso and ElasticNet. Linear regressions are mostly used since it requires less computational power and easy to understand.

The task is to predict scores of one variable(dependent variable) with the help of independent variable and find a straight line which fits through the points as good as possible. This line is called regression line.

Hypothesis equation

For regression , the general prediction formula for a linear model is:

ŷ = w[0]*x[0]+w[1]*x[1]+….. + w[p]*x[p] + b

where, w[i] = weights/parameters

x[i] = input variables

b = bias/constant

ŷ = predicted value

For a dataset with single feature this looks like:

ŷ = w[0] * x[0] + b

You may be familiar with this equation from your high school maths class.

Cost function

Linear regression finds the parameter w and b that minimizes the mean squared error between predictions and the true regression targets ,y ,on training data.

The mean squared error is the sum of squared differences between the predictions and true values.It is a better cost function then the simple difference cost function.

Implementation

Now lets implement this with the help of sklearn. We’ll use  the extended boston housing prices available in mglearn.datasets and try to fit this  in Linear Regression model .

 

#Importing dependencies
import numpy as np
import pandas as pd
import mglearn
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
#Loading dataset
X,y=mglearn.datasets.load_extended_boston()
#splitting train and test data
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0)
#evaluating the model with train and test score
print("training set score :{:.2f}".format(lr.score(X_train,y_train)))
print("test set score :{:.2f}".format(lr.score(X_test,y_test)))

This code will output this

training set score :0.95
test set score :0.61

That’s all! , you trained your first linear regression model.

This model over-fits , which can be solved by scaling but let’s not worry about that now.

Click here to open the github page of this python code.

Strengths and Weaknesses

Linear models are very fast to train and also fast to predict. They scale to very datasets and work well with sparse data.Another strength is that they make it relatively easy to understand how a prediction is made, using the formula we saw earlier .It often perform well when the number of features is large compared to the number of samples>

However in lower-dimensional spaces , other models might yield better generalization performance.


Next week I’m going to develop this linear regression model from scratch in python.Subscribe to get notification of the posts.If you have any question,ask it in forums. Do share this blog and I would love to hear your feedback in comments.

3 Replies to “Linear Regression”

Leave a Reply

Your email address will not be published. Required fields are marked *