(3 votes, average: 3.33 out of 5)

Straight line fit using least squares estimate

by on July 15, 2007

Two points suffice for drawing a straight line. However we may be presented with a set of data points (more than two?) presumably forming a straight line. How can one use the available set of data points to draw a straight line?

A probable approach is to draw a straight line which hopefully minimizes the error between the observed data points and estimated straight line.

$err = \Sigma_{i=1}^N\left(y_i - \hat{y}_i\right)^2$ where $y_i$ is the observed data points and $\hat{y}_i$ is the points from estimated straight line.

To draw the estimated straight line $\hat{y}=mx+c$, we need to estimate the slope, $m$ and the constant, $c$.

Formulating as a matrix,

$\left[\begin{eqnarray} y_1 \\ y_2 \\ y_3 \\ \vdots \\ y_n \end{eqnarray}\right]=\left[\begin{eqnarray} x_1\ 1\\ x_2\ 1 \\ x_3\ 1 \\ \vdots \\ x_n\ 1 \end{eqnarray}\right]\left[\begin{eqnarray} m \\ c \end{eqnarray}\right]+\left[\begin{eqnarray} \eta_1\\ \eta_2 \\ \eta_3 \\ \vdots \\ \eta_n\end{eqnarray}\right]$

$\mathbf{Y} = \mathbf{X}\left[\begin{eqnarray} m \\ c \end{eqnarray}\right]+ \mathbf{N}$,

where,

${y}_i$ = $\mathbf{Y}$ is the set of observations is a matrix of dimension $[N \times 1]$ ,

${x}_i$ = $\mathbf{X}$ is the set of coefficients is a matrix of dimension $[N \times 2]$,

$\left[\begin{eqnarray} m \\ c \end{eqnarray}\right]$ is the slope and constant estimate of dimension $[2\times 1]$,

$\eta_i$ = $\mathbf{N}$is the noise is a matrix of dimension $[N\times1]$ .

The least square estimate of the straight line is,

$\left[\begin{eqnarray} m \\ c \end{eqnarray}\right]=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}$.

A simple MATLAB code for least squares straight line fit is given below:

% Least Squares Estimate
rand(‘state’,100); % initializing the random number generation
y = [5:3:50]; % observations, y_i
y = y + 5*rand(size(y)); % y_i with noise added
x = 1:length(y); % the x co-ordinates

% Formulating in matrix for solving for least squares estimate
Y = y.’;
X = [x.' ones(1,length(x)).'];
alpha = inv(X’*X)*X’*Y; % solving for m and c

% constructing the straight line using the estimated slope and constant
yEst = alpha(1)*x + alpha(2);

close all
figure
plot(x,y,’r.’)
hold on
plot(x,yEst,’b')
legend(‘observations’, ‘estimated straight line’)
grid on
ylabel(‘observations’)
xlabel(‘x axis’)
title(‘least squares straight line fit’)

References:

D id you like this article? Make sure that you do not miss a new article by subscribing to RSS feed OR subscribing to e-mail newsletter. Note: Subscribing via e-mail entitles you to download the free e-Book on BER of BPSK/QPSK/16QAM/16PSK in AWGN.

shashi kant singh October 14, 2010 at 8:35 am

Fit a straight line trend & estimate the trend value for the year 2008 year : 2000, 2001, 2002, 2003, 2004, 2005, 2006 prod.: 110,112,115,119,121,123,1 26

Krishna Sankar November 17, 2010 at 5:31 am

@shashi: Hope you have solved this

Krishna Sankar May 18, 2008 at 3:18 pm

@Sajith: Is this a homework assignment? In general, I prefer not to solve homework assignments, rather help you towards the solution. You can formulate the information in the least square matrix formultion explained in the post.

Year = [1971 1976 1977 1978 1979 1980 1981 1982]
Sales = [6.7 5.3 4.3 6.1 5.6 7.9 5.8 6.1]
X = [Year' ones(8,1)]
Y = Sales.’

Once you have done that, then solution for slope and the constant is obtained by the leastsquares equation.
alpha = inv(X’*X)*X’*Y

Once you have the slope and constant, you can find the y-value (sales) for any x-value (year)

Hope this helps.

Sajith May 18, 2008 at 1:17 am

From the Data given below fit a straight line trend by the methord if least square and also estimate the sales for the year 1984.

Year 1971 1976 1977 1978 1979 1980 1981 1982
Sales 6.7 5.3 4.3 6.1 5.6 7.9 5.8 6.1

Krishna July 17, 2007 at 10:15 am

No special reason, except that when does this way, I have a reasonably clear idea of the underlying operations. Maybe helpful if I want to implement.

For the example above, the least squares solution can be obtained either by using X\Y or pinv(X)*Y. However, when X is rank-deficient, then the code in the post may fail and more ‘intelligent’ operations X\Y or pinv(X)*Y might be needed.
And a quick check showed that \ operator runs faster than pinv() or the code in the post.

Will Dwinnell July 16, 2007 at 7:10 pm

I find it interesting that you used inv and performed the matrix multiplication to solve this problem. Do you have a reason for doing it this way, instead of using MATLAB’s backslash operator (as in the linked material, below)?

Linear Regression in MATLAB

In curiosity,
Will