Straight line fit using least squares estimate

Two points suffice for drawing a straight line. However we may be presented with a set of data points (more than two?) presumably forming a straight line. How can one use the available set of data points to draw a straight line?

A probable approach is to draw a straight line which hopefully minimizes the error between the observed data points and estimated straight line.

$err = \Sigma_{i=1}^N\left(y_i - \hat{y}_i\right)^2$ where $y_i$ is the observed data points and $\hat{y}_i$ is the points from estimated straight line.

To draw the estimated straight line $\hat{y}=mx+c$ , we need to estimate the slope, $m$ and the constant, $c$ .

Formulating as a matrix,

$\left[\begin{eqnarray} y_1 \\ y_2 \\ y_3 \\ \vdots \\ y_n \end{eqnarray}\right]=\left[\begin{eqnarray} x_1\ 1\\ x_2\ 1 \\ x_3\ 1 \\ \vdots \\ x_n\ 1 \end{eqnarray}\right]\left[\begin{eqnarray} m \\ c \end{eqnarray}\right]+\left[\begin{eqnarray} \eta_1\\ \eta_2 \\ \eta_3 \\ \vdots \\ \eta_n\end{eqnarray}\right]$

$\mathbf{Y} = \mathbf{X}\left[\begin{eqnarray} m \\ c \end{eqnarray}\right]+ \mathbf{N}$ ,

where,

${y}_i$ = $\mathbf{Y}$ is the set of observations is a matrix of dimension $[N \times 1]$ ,

${x}_i$ = $\mathbf{X}$ is the set of coefficients is a matrix of dimension $[N \times 2]$ ,

$\left[\begin{eqnarray} m \\ c \end{eqnarray}\right]$ is the slope and constant estimate of dimension $[2\times 1]$ ,

$\eta_i$ = $\mathbf{N}$ is the noise is a matrix of dimension $[N\times1]$ .

The least square estimate of the straight line is,

$\left[\begin{eqnarray} m \\ c \end{eqnarray}\right]=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}$ .

A simple MATLAB code for least squares straight line fit is given below:

% Least Squares Estimate
rand(‘state’,100); % initializing the random number generation
y = [5:3:50]; % observations, y_i
y = y + 5*rand(size(y)); % y_i with noise added
x = 1:length(y); % the x co-ordinates

% Formulating in matrix for solving for least squares estimate
Y = y.’;
X = [x.’ ones(1,length(x)).’];
alpha = inv(X’*X)*X’*Y; % solving for m and c

% constructing the straight line using the estimated slope and constant
yEst = alpha(1)*x + alpha(2);

close all
figure
plot(x,y,’r.’)
hold on
plot(x,yEst,’b’)
legend(‘observations’, ‘estimated straight line’)
grid on
ylabel(‘observations’)
xlabel(‘x axis’)
title(‘least squares straight line fit’)

References:

[1] Details on Curve fitting toolbox from MathWorks(TM) website

7 thoughts on “Straight line fit using least squares estimate”

Pingback: Closed form solution for linear regression

Fit a straight line trend & estimate the trend value for the year 2008 year : 2000, 2001, 2002, 2003, 2004, 2005, 2006 prod.: 110,112,115,119,121,123,1 26

Krishna Sankar says:

November 17, 2010 at 5:31 am

@shashi: Hope you have solved this 🙂

Reply

@Sajith: Is this a homework assignment? In general, I prefer not to solve homework assignments, rather help you towards the solution. You can formulate the information in the least square matrix formultion explained in the post.

Year = [1971 1976 1977 1978 1979 1980 1981 1982]
Sales = [6.7 5.3 4.3 6.1 5.6 7.9 5.8 6.1]
X = [Year’ ones(8,1)]
Y = Sales.’

Once you have done that, then solution for slope and the constant is obtained by the leastsquares equation.
alpha = inv(X’*X)*X’*Y

Once you have the slope and constant, you can find the y-value (sales) for any x-value (year)

Hope this helps.

From the Data given below fit a straight line trend by the methord if least square and also estimate the sales for the year 1984.

Year 1971 1976 1977 1978 1979 1980 1981 1982
Sales 6.7 5.3 4.3 6.1 5.6 7.9 5.8 6.1

No special reason, except that when does this way, I have a reasonably clear idea of the underlying operations. Maybe helpful if I want to implement.

For the example above, the least squares solution can be obtained either by using X\Y or pinv(X)*Y. However, when X is rank-deficient, then the code in the post may fail and more ‘intelligent’ operations X\Y or pinv(X)*Y might be needed.
And a quick check showed that \ operator runs faster than pinv() or the code in the post.

Additionally, a nice thread in comp.dsp on this topic maybe interesting.

I find it interesting that you used inv and performed the matrix multiplication to solve this problem. Do you have a reason for doing it this way, instead of using MATLAB’s backslash operator (as in the linked material, below)?

Linear Regression in MATLAB

In curiosity,
Will

Migrated to Amazon EC2 instance (from shared hosting)

GATE-2012 ECE Q28 (electromagnetics)

Image Rejection Ratio (IMRR) with transmit IQ gain/phase imbalance

GATE-2012 ECE Q15 (communication)

GATE-2012 ECE Q7 (digital)

GATE-2012 ECE Q13 (circuits)

Straight line fit using least squares estimate

7 thoughts on “Straight line fit using least squares estimate”

Leave a Reply Cancel reply

7 thoughts on “Straight line fit using least squares estimate”

Leave a Reply Cancel reply

Related Articles