<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DSP log &#187; DSP</title>
	<atom:link href="http://www.dsplog.com/category/dsp/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dsplog.com</link>
	<description>Signal Processing for Communication</description>
	<lastBuildDate>Thu, 26 Jan 2012 01:29:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Least Squares in Gaussian Noise &#8211; Maximum Likelihood</title>
		<link>http://www.dsplog.com/2012/01/15/least-squares-gaussian-noise-maximum-likelihood/</link>
		<comments>http://www.dsplog.com/2012/01/15/least-squares-gaussian-noise-maximum-likelihood/#comments</comments>
		<pubDate>Sun, 15 Jan 2012 04:45:20 +0000</pubDate>
		<dc:creator>Krishna Sankar</dc:creator>
				<category><![CDATA[DSP]]></category>
		<category><![CDATA[machine_learning]]></category>
		<category><![CDATA[ML]]></category>

		<guid isPermaLink="false">http://www.dsplog.com/?p=1075</guid>
		<description><![CDATA[From the previous posts on Linear Regression (using Batch Gradient descent, Stochastic Gradient Descent, Closed form solution), we discussed couple of different ways to estimate the  parameter vector in the least square error sense for the given training set. However, how does the least square error criterion work when the training set is corrupted by [...]
Related posts:<ol>
<li><a href='http://www.dsplog.com/2007/07/15/straight-line-fit-using-least-squares-estimate/' rel='bookmark' title='Straight line fit using least squares estimate'>Straight line fit using least squares estimate</a></li>
<li><a href='http://www.dsplog.com/2011/10/29/batch-gradient-descent/' rel='bookmark' title='Batch Gradient Descent'>Batch Gradient Descent</a></li>
<li><a href='http://www.dsplog.com/2011/11/15/stochastic-gradient-descent/' rel='bookmark' title='Stochastic Gradient Descent'>Stochastic Gradient Descent</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p></p><p>From the previous posts on Linear Regression (using <a title="Batch Gradient Descent" href="http://www.dsplog.com/2011/10/29/batch-gradient-descent/" target="_blank">Batch Gradient descent</a>, <a title="Stochastic Gradient Descent" href="http://www.dsplog.com/2011/11/15/stochastic-gradient-descent/" target="_blank">Stochastic Gradient Descent</a>, <a title="Closed form solution for least squares" href="http://www.dsplog.com/2011/12/04/closed-form-solution-linear-regression/" target="_blank">Closed form solution</a>), we discussed couple of different ways to estimate the  parameter vector in the least square error sense for the given training set. However, how does the least square error criterion work when the training set is corrupted by noise? In this post, let us discuss the case where training set is corrupted by Gaussian noise.</p>
<p><span id="more-1075"></span></p>
<p>For the <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?j^{th}" alt="" align="absmiddle" border="0" /> training set, the system model is :</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?y^{(j)} = \theta^Tx^{(j)} + n^{(j)}" alt="" align="absmiddle" border="0" />,</p>
<p>where,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x^{(j)}" alt="" align="absmiddle" border="0" /> is the input sequence,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?y^{(j)}" alt="" align="absmiddle" border="0" /> is the output  sequence,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\theta" alt="" align="absmiddle" border="0" /> is the parameter vector and</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?n^{(j)}" alt="" align="absmiddle" border="0" /> is the noise in the observations.</p>
<p>Let us assume that the noise term <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?n^{(j)}" alt="" align="absmiddle" border="0" /> are independent and identically distributed following a Gaussian probability having mean 0 and variance <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\sigma^2" alt="" align="absmiddle" border="0" />.</p>
<p>The probability density function of noise term can be written as,<br />
<img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?p\(n^{(j)}\) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(n^{(j))^2}}{2\sigma^2}" alt="" align="absmiddle" border="0" />.</p>
<p>This means that probability of the output sequence <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?y^{(j)}" alt="" align="absmiddle" border="0" /> given <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x^{(j)}" alt="" align="absmiddle" border="0" /> and parameterised by <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\theta" alt="" align="absmiddle" border="0" /> is,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?p\(y^{(j)}|x^{(j)};%20\theta\)%20=%20\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{\(y^{(i)}-\theta^Tx^{(i)}\)^2}{2\sigma^2}}" alt="" align="absmiddle" border="0" />.</p>
<p>Let us write the<strong> likelihood of</strong> <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\theta" alt="" align="absmiddle" border="0" />, given all the observations of input sequence <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?X" alt="" align="absmiddle" border="0" /> and output <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?Y" alt="" align="absmiddle" border="0" /> as,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?L(\theta)=p(Y|X;\theta)" alt="" align="absmiddle" border="0" />.</p>
<p>Given that all the <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?m" alt="" align="absmiddle" border="0" /> observations are independent, the <strong>likelihood of</strong> <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\theta" alt="" align="absmiddle" border="0" /> is,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\begin{array}{lll} L(\theta) &amp; = &amp; \prod_{1}^{m}p\(y^{(j)}|x^{(j)};%20\theta\)\\ &amp; = &amp; \prod_{1}^{m}%20\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{\(y^{(i)}-\theta^Tx^{(i)}\)^2}{2\sigma^2}}\end{array}" alt="" align="absmiddle" border="0" />.</p>
<p>Taking logarithm on both sides, the<strong> log-likelihood function is,</strong></p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\begin{array}{lll}l(\theta)%20&amp;%20=%20&amp;%20\log%20L(\theta)\\&amp;%20=%20&amp;%20\log%20\prod_{i=1}^{m}\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{\(y^{(i)}-\theta^Tx^{(i)}\)^2}{2\sigma^2}}\\%20&amp;%20=%20&amp;%20\sum_{i=1}^{m}\log%20%20\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{\(y^{(i)}-\theta^Tx^{(i)}\)^2}{2\sigma^2}}\\%20&amp;%20=%20&amp;%20%20m%20\log%20\frac{1}{\sqrt{2\pi\sigma^2}}-\frac{1}{2\sigma^2}\underbrace{\sum_{i=1}^m\(y^{(i)}-\theta^Tx^{(i)}\)^2}\end{array}" alt="" align="absmiddle" border="0" />.</p>
<p>From the above expression, we can see that maximizing the likelihood function <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?L(\theta)" alt="" align="absmiddle" border="0" /> is same as minimizing</p>
<p><img class="aligncenter" src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\Large \sum_{i=1}^m\(y^{(i)}-\theta^Tx^{(i)}\)^2 = J(\theta)" alt="" align="absmiddle" border="0" /></p>
<p>Recall: This is same cost function which was <a title="Batch Gradient Descent" href="http://www.dsplog.com/2011/10/29/batch-gradient-descent/" target="_blank">minimized in the Least Squares solution</a>.</p>
<p><strong>Summarizing:</strong></p>
<p>a) When the observations are corrupted by <strong>independent Gaussian Noise</strong>, the<strong> least squares solution</strong> is the <strong>Maximum Likelihood estimate</strong> of the parameter vector <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\theta" alt="" align="absmiddle" border="0" />.</p>
<p>b) The term <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\frac{1}{\sigma^2}" alt="" align="absmiddle" border="0" /> is not a playing a role in this minimization. However if the noise variance of each observation is different, this needs to get factored in. We will discuss this in another post.</p>
<h2><strong>References</strong></h2>
<p><a title="Lecture Notes1, Chapter 3, Prof. Andrew Ng" href="http://cs229.stanford.edu/notes/cs229-notes1.pdf" target="_blank">CS229 Lecture notes1, Chapter 3 Probabilistic Interpretation, Prof. Andrew Ng</a></p>
<p>&nbsp;</p>
<p>Related posts:<ol>
<li><a href='http://www.dsplog.com/2007/07/15/straight-line-fit-using-least-squares-estimate/' rel='bookmark' title='Straight line fit using least squares estimate'>Straight line fit using least squares estimate</a></li>
<li><a href='http://www.dsplog.com/2011/10/29/batch-gradient-descent/' rel='bookmark' title='Batch Gradient Descent'>Batch Gradient Descent</a></li>
<li><a href='http://www.dsplog.com/2011/11/15/stochastic-gradient-descent/' rel='bookmark' title='Stochastic Gradient Descent'>Stochastic Gradient Descent</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.dsplog.com/2012/01/15/least-squares-gaussian-noise-maximum-likelihood/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Newton&#8217;s method to find square root, inverse</title>
		<link>http://www.dsplog.com/2011/12/25/newtons-method-square-root-inverse/</link>
		<comments>http://www.dsplog.com/2011/12/25/newtons-method-square-root-inverse/#comments</comments>
		<pubDate>Sun, 25 Dec 2011 14:36:35 +0000</pubDate>
		<dc:creator>Krishna Sankar</dc:creator>
				<category><![CDATA[DSP]]></category>
		<category><![CDATA[newtons_method]]></category>

		<guid isPermaLink="false">http://www.dsplog.com/?p=1030</guid>
		<description><![CDATA[Some of us would have used Newton&#8217;s method (also known as Newton-Raphson method) in some form or other. The method has quite a bit of history,  starting with the Babylonian way of finding the square root and later over centuries reaching the present recursive way of finding the solution. In this post, we will describe [...]
Related posts:<ol>
<li><a href='http://www.dsplog.com/2008/07/28/chi-square-random-variable/' rel='bookmark' title='Chi Square Random Variable'>Chi Square Random Variable</a></li>
<li><a href='http://www.dsplog.com/2007/03/19/signal-to-quantization-noise-in-quantized-sinusoidal/' rel='bookmark' title='Signal to quantization noise in quantized sinusoidal'>Signal to quantization noise in quantized sinusoidal</a></li>
<li><a href='http://www.dsplog.com/2011/12/04/closed-form-solution-linear-regression/' rel='bookmark' title='Closed form solution for linear regression'>Closed form solution for linear regression</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p></p><p>Some of us would have used <a title="Newton's method wiki" href="http://en.wikipedia.org/wiki/Newton's_method">Newton&#8217;s method</a> (also known as Newton-Raphson method) in some form or other. The method has quite a bit of history,  starting with the <a title="Babylonian way of finding square root" href="http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Babylonian_method" target="_blank">Babylonian way of finding the square root</a> and later over centuries reaching the present recursive way of finding the solution. In this post, we will describe Newton&#8217;s method and apply it to find the square root and the inverse of a number.</p>
<h2>Geometrical interpretation</h2>
<p><img class="aligncenter size-full wp-image-1033" title="function_tangent_slope" src="http://www.dsplog.com/db-install/wp-content/uploads/2011/12/function_tangent_slope.png" alt="" width="403" height="276" /></p>
<p>We know that the derivative of a function <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?f(x)" alt="" align="absmiddle" border="0" /> at <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x_0" alt="" align="absmiddle" border="0" /> is the slope of the tangent (red line) at <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x_0" alt="" align="absmiddle" border="0" /> i.e.,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?f^'(x_0) = \frac{f(x_0)}{x_0-x_1}" alt="" align="absmiddle" border="0" />.</p>
<p>Rearranging, the intercept of the tangent at x-axis is,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x_1 = x_0 - \frac{f(x_0)}{f^'(x_0)}" alt="" align="absmiddle" border="0" />.</p>
<p>From the figure above, can see that the tangent (red line) intercepts the x-axis at <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x_1" alt="" align="absmiddle" border="0" /> which is closer to  the <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?f(x)=0" alt="" align="absmiddle" border="0" /> where compared to <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x_0" alt="" align="absmiddle" border="0" />. Keep on doing this operation recursively, and it converges to the zero of the function OR in another words the root of the function.</p>
<p>In general for <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?n^{th}" alt="" align="absmiddle" border="0" />iteration, the equation is :</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\Huge x_{n+1} = x_{n} - \frac{f(x_n)}{f^'(x_n)}" alt="" align="absmiddle" border="0" />.</p>
<p>&nbsp;</p>
<h2>Finding square root</h2>
<p>Let us, for example try to use this method for finding the square root of D=100. The function to zero out in the Newton&#8217;s method frame work  is,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?f(x) = x^2 - D" alt="" align="absmiddle" border="0" />, where <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?D=100" alt="" align="absmiddle" border="0" />.</p>
<p>The first derivative is</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?f^'(x) = 2x" alt="" align="absmiddle" border="0" />.</p>
<p>The recursive equation is,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x_{n+1} = x_{n} - \frac{x_n^2-D}{2x_n}" alt="" align="absmiddle" border="0" />.</p>
<p><strong>Matlab code snippet</strong></p>
<pre>clear ; close all;
D = 100; % number to find the square root
x = 1; % initial value
for ii = 1:10
	fx = x.^2 - D;
	f1x = 2*x;
	x = x - fx/f1x;
	x_v(ii) = x;
end
x_v' =
   50.5000000000000
   26.2400990099010
   15.0255301199868
   10.8404346730269
   10.0325785109606
   10.0000528956427
   10.0000000001399
   10.0000000000000
   10.0000000000000
   10.0000000000000</pre>
<p>We can see that the it converges within around 8 iterations. Further, playing around with the initial value,</p>
<p>a) if we start with initial value of x = -1, then we will converge to -10.</p>
<p>b) if we start with initial value of x = 0, then we will not converge</p>
<p>and so on&#8230;</p>
<h2>Finding inverse (division)</h2>
<p>Newton&#8217;s method can be used to find the inverse of a variable D. One way to write the function to zero out is<img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?f(x) = xD - 1" alt="" align="absmiddle" border="0" />, but we soon realize that this does not work as we need know <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\frac{1}{D}" alt="" align="absmiddle" border="0" /> in the first place.</p>
<p>Alternatively the function to zero out can be written as,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?f(x) = \frac{1}{x} - D" alt="" align="absmiddle" border="0" />.</p>
<p>The first derivative is,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?f^'(x) = -\frac{1}{x^2}" alt="" align="absmiddle" border="0" />.</p>
<p>The equation in the recursive form is,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\begin{array}{lll}x_{n+1} &amp;= &amp; x_{n} - \(\frac{\frac{1}{x}-D}{-\frac{1}{x^2}}\)\\ &amp; = &amp; x_n\(2-x_n)\end{array}" alt="" align="absmiddle" border="0" />.</p>
<p><strong>Matlab code snippet</strong></p>
<pre>clear ; close all;
D = .1; % number to find the square root
x = [.1:.2:1]; % initial value
for ii = 1:10
	fx = (1./x) - D;
	f1x = -1./x.^2;
	x = x - fx./f1x;
	x_v(:,ii) = x;
end
plot(x_v');
legend('0.1', '0.3', '0.5', '0.7', '0.9');
grid on; xlabel('number of iterations'); ylabel('inverse');
title('finding inverse newton''s method');</pre>
<p>The following plot shows the convergence of inverse computation to the right value for different values of<img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x_0" alt="" align="absmiddle" border="0" /> for this example matlab code snippet.</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1043" title="convergence_inverse_newtons_method" src="http://www.dsplog.com/db-install/wp-content/uploads/2011/12/convergence_inverse_newtons_method1.png" alt="" width="448" height="336" /><br />
<strong>Figure : convergence of inverse computation</strong></p>
<h2 style="text-align: left;">Finding the minima of a function</h2>
<p>To find the minima of a function, we to find where the derivative of the function becomes zero i.e.  <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?f^'(x) = 0" alt="" align="absmiddle" border="0" />.</p>
<p>Using Newton&#8217;s method, the recursive equation becomes :</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x_{n+1} = x_n - \frac{f^'(x)}{f^{''}(x)} = 0" alt="" align="absmiddle" border="0" />.</p>
<h2>Thoughts</h2>
<p>We have  briefly gone through the Newton&#8217;s method and its applications to find the roots of a function, inverse, minima etc. However, there are quite a few aspects which we did not go over, like :</p>
<p>a) Impact of the initial value on the convergence of the function</p>
<p>b) Rate of the convergence</p>
<p>c) Error bounds of the converged result</p>
<p>d) Conditions where the convergence does not happen</p>
<p>and so on&#8230;</p>
<p>Hoping to discuss those in another post&#8230; <img src='http://www.dsplog.com/db-install/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<h2>References</h2>
<p><a title="Newton's method wiki" href="http://en.wikipedia.org/wiki/Newton's_method">Wiki on Newton&#8217;s method</a></p>
<p>Related posts:<ol>
<li><a href='http://www.dsplog.com/2008/07/28/chi-square-random-variable/' rel='bookmark' title='Chi Square Random Variable'>Chi Square Random Variable</a></li>
<li><a href='http://www.dsplog.com/2007/03/19/signal-to-quantization-noise-in-quantized-sinusoidal/' rel='bookmark' title='Signal to quantization noise in quantized sinusoidal'>Signal to quantization noise in quantized sinusoidal</a></li>
<li><a href='http://www.dsplog.com/2011/12/04/closed-form-solution-linear-regression/' rel='bookmark' title='Closed form solution for linear regression'>Closed form solution for linear regression</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.dsplog.com/2011/12/25/newtons-method-square-root-inverse/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Closed form solution for linear regression</title>
		<link>http://www.dsplog.com/2011/12/04/closed-form-solution-linear-regression/</link>
		<comments>http://www.dsplog.com/2011/12/04/closed-form-solution-linear-regression/#comments</comments>
		<pubDate>Sun, 04 Dec 2011 13:07:51 +0000</pubDate>
		<dc:creator>Krishna Sankar</dc:creator>
				<category><![CDATA[DSP]]></category>
		<category><![CDATA[machine_learning]]></category>

		<guid isPermaLink="false">http://www.dsplog.com/?p=1012</guid>
		<description><![CDATA[In the previous post on Batch Gradient Descent and Stochastic Gradient Descent, we looked at two iterative methods for finding the parameter vector  which minimizes the square of the error between the predicted value  and the actual output  for all  values in the training set. A closed form solution for finding the parameter vector  is possible, and in this post [...]
Related posts:<ol>
<li><a href='http://www.dsplog.com/2011/10/29/batch-gradient-descent/' rel='bookmark' title='Batch Gradient Descent'>Batch Gradient Descent</a></li>
<li><a href='http://www.dsplog.com/2011/11/15/stochastic-gradient-descent/' rel='bookmark' title='Stochastic Gradient Descent'>Stochastic Gradient Descent</a></li>
<li><a href='http://www.dsplog.com/2008/11/20/linear-to-log-conversion/' rel='bookmark' title='Linear to log conversion'>Linear to log conversion</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p></p><p>In the previous post on <a title="Batch Gradient Descent" href="http://www.dsplog.com/2011/10/29/batch-gradient-descent/" target="_blank">Batch Gradient Descent</a> and <a title="Stochastic Gradient Descent" href="http://www.dsplog.com/2011/11/15/stochastic-gradient-descent/" target="_blank">Stochastic Gradient Descent</a>, we looked at two iterative methods for finding the parameter vector <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Ctheta" alt="" align="absmiddle" border="0" /> which minimizes the square of the error between the predicted value <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?h_%7B%5Ctheta%7D%28x%29" alt="" align="absmiddle" border="0" /> and the actual output <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?y" alt="" align="absmiddle" border="0" /> for all <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?j" alt="" align="absmiddle" border="0" /> values in the training set.</p>
<p>A closed form solution for finding the parameter vector <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Ctheta" alt="" align="absmiddle" border="0" /> is possible, and in this post let us explore that. Ofcourse, I thank Prof. Andrew Ng for putting all these material available on public domain (<a title="Lecture Notes 1 (from Prof. Andrew Ng's course on Machine Learning)" href="http://cs229.stanford.edu/notes/cs229-notes1.pdf" target="_blank">Lecture Notes 1</a>).</p>
<p><span id="more-1012"></span></p>
<h2>Notations</h2>
<p>Let&#8217;s revisit the notations.</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?m" alt="" align="absmiddle" border="0" /> be the number of training set (in our case top 50 articles),</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x" alt="" align="absmiddle" border="0" /> be the input sequence (the page index),</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?y" alt="" align="absmiddle" border="0" /> be the output sequence (the page views for each page index)</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?n" alt="" align="absmiddle" border="0" /> be the number of features/parameters (=2 for our example).</p>
<p>The value of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?(x^j,y^j)" alt="" align="absmiddle" border="0" /> corresponds to the <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?j^{th}" alt="" align="absmiddle" border="0" /> training set</p>
<p>The predicted the number of page views for a given page index using a hypothesis <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?h_%7B%5Ctheta%7D%28x%29" alt="" align="absmiddle" border="0" /> defined as :</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Cbegin%7Barray%7D%7Blll%7Dh_%7B%5Ctheta%7D%28x%29&amp;=&amp;%5Ctheta%7B_0%7Dx_0%20+%20%5Ctheta%7B_1%7Dx_1%5C%5C&amp;=&amp;%5Csum_%7Bi=0%7D%5E%7Bn-1%7D%5Ctheta%7B_i%7Dx_i%5C%5C&amp;=&amp;%5Ctheta%5ETx%5C%5C%5Cend%7Barray%7D" alt="" align="absmiddle" border="0" /></p>
<p>where,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x_1" alt="" align="absmiddle" border="0" /> is the page index,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x_0\ = \ 1" alt="" align="absmiddle" border="0" />.</p>
<p><strong>Formulating in matrix notations:</strong></p>
<p>The input sequence is,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?X%20=%20\[\begin{array}{mm}x_0^1%20&amp;%20x_1^1%20\\x_0^2%20&amp;%20x_1^2\\%20\vdots%20&amp;%20\vdots%20\\x_0^m%20&amp;%20x_1^m\end{array}\]" alt="" align="absmiddle" border="0" /> of dimension [m x n]</p>
<p>The measured values are,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?Y%20=%20\[\begin{array}{m}y^1\\y^2\\\vdots%20\\y^m\end{array}\]" alt="" align="absmiddle" border="0" /> of dimension [m x 1].</p>
<p>The parameter vector is,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\theta=\[\begin{array}{m}\theta_0\\\theta_1\end{array}\]" alt="" align="absmiddle" border="0" /> of dimension [n x 1]</p>
<p>The hypothesis term is,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?H_\theta(X)=X\theta=%20\[\begin{array}{mm}x_0^1%20&amp;%20x_1^1%20\\x_0^2%20&amp;%20x_1^2\\%20\vdots%20&amp;%20\vdots%20\\x_0^m%20&amp;%20x_1^m\end{array}\]\[\begin{array}{m}\theta_0\\\theta_1%20\end{array}\]" alt="" align="absmiddle" border="0" /> of dimension [m x 1].</p>
<p>From the above,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?H_\theta(X)-Y=X\theta-Y=%20\[\begin{array}{mm}h_\theta(x^1)-y^1\\h_\theta(x^2)-y^2\\\vdots\\h_\theta(x^m)-y^m\end{array}\]" alt="" align="absmiddle" border="0" />.</p>
<p><strong>Recall</strong> :</p>
<p>Our goal is to find the parameter vector <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Ctheta" alt="" align="absmiddle" border="0" /> which minimizes the square of the error between the predicted value <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?h_%7B%5Ctheta%7D%28x%29" alt="" align="absmiddle" border="0" /> and the actual output <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?y" alt="" align="absmiddle" border="0" /> for all <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?j" alt="" align="absmiddle" border="0" /> values in the training set i.e.</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\min_{\theta} \sum_{j=1}^m\[h_{\theta}(x^j) - y^j\]^2" alt="" align="absmiddle" border="0" />.</p>
<p>From matrix algebra, we know that</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi? \sum_{j=1}^m\[h_{\theta}(x^j) - y^j\]^2=\(X\theta-Y\)^T\(X\theta-Y\)" alt="" align="absmiddle" border="0" />.</p>
<p>So we can now go about to define the cost function <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?J\(\theta\)" alt="" align="absmiddle" border="0" /> as,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?J\(\theta\)=\frac{1}{2}\sum_{j=1}^m\[h_{\theta}(x^j) - y^j\]^2 = \frac{1}{2}\(X\theta-Y\)^T\(X\theta-Y\)" alt="" align="absmiddle" border="0" />.</p>
<p>To find the value of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Ctheta" alt="" align="absmiddle" border="0" /> which minimizes <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?J\(\theta\)" alt="" align="absmiddle" border="0" />, we can differentiate <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?J\(\theta\)" alt="" align="absmiddle" border="0" /> with respect to <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Ctheta" alt="" align="absmiddle" border="0" />.</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\begin{array}{lll}\frac{\partial}{\partial\theta}J\(\theta\)&amp;%20=&amp;%20%20\frac{1}{2}\frac{\partial}{\partial\theta}\(X\theta-Y\)^T\(X\theta-Y\)\\&amp;=&amp;\frac{1}{2}\frac{\partial}{\partial\theta}\(\theta^TX^TX\theta%20-%20\theta^TX^TY%20-Y^TX\theta+Y^TY%20\)\\&amp;=&amp;\(X^TX\theta%20-%20X^TY\)\end{array}}" alt="" align="absmiddle" border="0" /></p>
<p>To find the value of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Ctheta" alt="" align="absmiddle" border="0" /> which minimizes <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Ctheta" alt="" align="absmiddle" border="0" />,  we set</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\frac{\partial}{\partial\theta}J(\theta)=0" alt="" align="absmiddle" border="0" />,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\begin{array}{lll}\(X^TX\theta%20-%20X^TY\)&amp;=&amp;0\end{array}" alt="" align="absmiddle" border="0" />.</p>
<p>Solving,</p>
<p><img class="aligncenter" src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\Huge\begin{array}{lll}\theta &amp; = &amp; \(X^TX\)^{-1}X^TY\end{array}" alt="" align="absmiddle" border="0" /></p>
<p><strong>Note :</strong> (Update 7th Dec 2011)</p>
<p>As pointed by Mr. Andre KrishKevich, the above solution is same as the formula for liner least squares fit (<a title="Straight line fit using least squares estimate" href="http://www.dsplog.com/2007/07/15/straight-line-fit-using-least-squares-estimate/">linear least squares</a>, <a title="Least Squares in Wiki" href="http://en.wikipedia.org/wiki/Least_squares">least square in wiki</a>)</p>
<h2 align="absmiddle">Matlab/Octave code snippet</h2>
<pre>clear ;
close all;
x = [1:50].';
y = [4554 3014 2171 1891 1593 1532 1416 1326 1297 1266 ...
	1248 1052 951 936 918 797 743 665 662 652 ...
	629 609 596 590 582 547 486 471 462 435 ...
	424 403 400 386 386 384 384 383 370 365 ...
	360 358 354 347 320 319 318 311 307 290 ].';

m = length(y); % store the number of training examples
x = [ ones(m,1) x]; % Add a column of ones to x
n = size(x,2); % number of features
theta_vec = inv(x'*x)*x'*y;</pre>
<p>The computed <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\theta" alt="" align="absmiddle" border="0" /> values are</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\begin{array}{lll}\theta_0&amp;=&amp;1840.618\\\theta_1&amp;=&amp;-39.820\end{array}" alt="" align="absmiddle" border="0" />.</p>
<p>&nbsp;</p>
<p><strong>Note :</strong></p>
<p>a)</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\begin{array}{lll}\frac{\partial}{\partial\theta}\theta^TX^TX\theta%20&amp;%20=%20&amp;%20X^TX\theta+XX^T\theta\\%20&amp;%20=&amp;2X^TX\theta\end{array}" alt="" align="absmiddle" border="0" /></p>
<p>(<a title="Matrix Calculus Notes" href="http://www.colorado.edu/engineering/cas/courses.d/IFEM.d/IFEM.AppD.d/IFEM.AppD.pdf" target="_blank">Refer: Matrix calculus notes</a> - University of Colorado)</p>
<p>b)</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\begin{array}{lll}-\frac{\partial}{\partial\theta}(\theta^TX^TY)%20&amp;=&amp;-\frac{\partial}{\partial\theta}tr(\theta^TX^TY)%20\\%20&amp;%20=%20&amp;%20%20-\frac{\partial}{\partial\theta}tr(Y^TX\theta)\\&amp;=&amp;-X^TY\end{array}" alt="" align="absmiddle" border="0" /></p>
<p>(<a title="Matrix Calculus Wiki" href="http://en.wikipedia.org/wiki/Matrix_calculus" target="_blank">Refer : Matrix Calculus Wiki</a>)</p>
<p><strong style="font-size: 20px;">References</strong></p>
<p><a href="http://academicearth.org/lectures/supervised-learning-autonomous-deriving">An Application of Supervised Learning – Autonomous Deriving (Video Lecture, Class2)</a></p>
<p><a title="CS 229 Machine Learning Course Materials" href="http://cs229.stanford.edu/materials.html" target="_blank">CS 229 Machine Learning Course Materials</a></p>
<p><a title="Matrix Calculus Notes" href="http://www.colorado.edu/engineering/cas/courses.d/IFEM.d/IFEM.AppD.d/IFEM.AppD.pdf" target="_blank">Refer: Matrix calculus notes</a> - University of Colorado</p>
<p><a title="Matrix Calculus Wiki" href="http://en.wikipedia.org/wiki/Matrix_calculus" target="_blank">Matrix Calculus Wiki</a></p>
<p>Related posts:<ol>
<li><a href='http://www.dsplog.com/2011/10/29/batch-gradient-descent/' rel='bookmark' title='Batch Gradient Descent'>Batch Gradient Descent</a></li>
<li><a href='http://www.dsplog.com/2011/11/15/stochastic-gradient-descent/' rel='bookmark' title='Stochastic Gradient Descent'>Stochastic Gradient Descent</a></li>
<li><a href='http://www.dsplog.com/2008/11/20/linear-to-log-conversion/' rel='bookmark' title='Linear to log conversion'>Linear to log conversion</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.dsplog.com/2011/12/04/closed-form-solution-linear-regression/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Stochastic Gradient Descent</title>
		<link>http://www.dsplog.com/2011/11/15/stochastic-gradient-descent/</link>
		<comments>http://www.dsplog.com/2011/11/15/stochastic-gradient-descent/#comments</comments>
		<pubDate>Mon, 14 Nov 2011 23:56:00 +0000</pubDate>
		<dc:creator>Krishna Sankar</dc:creator>
				<category><![CDATA[DSP]]></category>
		<category><![CDATA[machine_learning]]></category>

		<guid isPermaLink="false">http://www.dsplog.com/?p=1000</guid>
		<description><![CDATA[For curve fitting using linear regression, there exists a minor variant of Batch Gradient Descent algorithm, called Stochastic Gradient Descent. In the Batch Gradient Descent, the parameter vector  is updated as, . (loop over all elements of training set in one iteration) For Stochastic Gradient Descent, the vector gets updated as, at each iteration the [...]
Related posts:<ol>
<li><a href='http://www.dsplog.com/2012/01/15/least-squares-gaussian-noise-maximum-likelihood/' rel='bookmark' title='Least Squares in Gaussian Noise &#8211; Maximum Likelihood'>Least Squares in Gaussian Noise &#8211; Maximum Likelihood</a></li>
<li><a href='http://www.dsplog.com/2011/10/29/batch-gradient-descent/' rel='bookmark' title='Batch Gradient Descent'>Batch Gradient Descent</a></li>
<li><a href='http://www.dsplog.com/2011/12/04/closed-form-solution-linear-regression/' rel='bookmark' title='Closed form solution for linear regression'>Closed form solution for linear regression</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p></p><p>For curve fitting using linear regression, there exists a minor variant of <a title="Batch Gradient Descent" href="http://www.dsplog.com/2011/10/29/batch-gradient-descent/" target="_blank">Batch Gradient Descent</a> algorithm, called Stochastic Gradient Descent.</p>
<p>In the <strong>Batch Gradient Descent</strong>, the parameter vector <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\theta" alt="" /> is updated as,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\begin{array}{lll}\theta_i&amp;:=&amp;\theta_i - \alpha\sum_{j=1}^m\[h_{\theta}(x^j) - y^j\]x_i^j\end{array}" alt="" align="absmiddle" border="0" />.</p>
<p>(loop over all elements of training set in one iteration)</p>
<p>For <strong>Stochastic Gradient Descent</strong>, the vector gets updated as, at each iteration the algorithm goes over only one among <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?j^{th}" alt="" align="absmiddle" border="0" /> training set, i.e.</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\begin{array}{ll}for\ j\ = 1\ to \ m\ : &amp;  \\&amp;\theta_i:=\theta_i - \alpha\[h_{\theta}(x^j) - y^j\]x_i^j\end{array}" alt="" align="absmiddle" border="0" />.</p>
<p>When the training set is large, Stochastic Gradient Descent can be useful (as we need not go over the full data to get the first set of the parameter vector <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\theta" alt="" />)</p>
<p>For the same Matlab example used in the previous post, we can see that both batch and stochastic gradient descent converged to reasonably close values.</p>
<h2 align="absmiddle">Matlab/Octave code snippet</h2>
<pre class="html">clear ;
close all;
x = [1:50].';
y = [4554 3014 2171 1891 1593 1532 1416 1326 1297 1266 ...
	1248 1052 951 936 918 797 743 665 662 652 ...
	629 609 596 590 582 547 486 471 462 435 ...
	424 403 400 386 386 384 384 383 370 365 ...
	360 358 354 347 320 319 318 311 307 290 ].';
%
m = length(y); % store the number of training examples
x = [ ones(m,1) x]; % Add a column of ones to x
n = size(x,2); % number of features
theta_batch_vec = [0 0]';
theta_stoch_vec = [0 0]';
alpha = 0.002;
err = [0 0]';
theta_batch_vec_v = zeros(10000,2);
theta_stoch_vec_v = zeros(50*10000,2);
for kk = 1:10000
	% batch gradient descent - loop over all training set
	h_theta_batch = (x*theta_batch_vec);
	h_theta_batch_v = h_theta_batch*ones(1,n);
	y_v = y*ones(1,n);
	theta_batch_vec = theta_batch_vec - alpha*1/m*sum((h_theta_batch_v - y_v).*x).';
	theta_batch_vec_v(kk,:) = theta_batch_vec;
	%j_theta_batch(kk) = 1/(2*m)*sum((h_theta_batch - y).^2);

	% stochastic gradient descent - loop over one training set at a time
	for (jj = 1:50)
		h_theta_stoch = (x(jj,:)*theta_stoch_vec);
		h_theta_stoch_v = h_theta_stoch*ones(1,n);
		y_v = y(jj,:)*ones(1,n);
		theta_stoch_vec = theta_stoch_vec - alpha*1/m*((h_theta_stoch_v - y_v).*x(jj,:)).';
		%j_theta_stoch(kk,jj) = 1/(2*m)*sum((h_theta_stoch - y).^2);
		theta_stoch_vec_v(50*(kk-1)+jj,:) = theta_stoch_vec;
	end
end

figure;
plot(x(:,2),y,'bs-');
hold on
plot(x(:,2),x*theta_batch_vec,'md-');
plot(x(:,2),x*theta_stoch_vec,'rp-');
legend('measured', 'predicted-batch','predicted-stochastic');
grid on;
xlabel('Page index, x');
ylabel('Page views, y');
title('Measured and predicted page views');

j_theta = zeros(250, 250);   % initialize j_theta
theta0_vals = linspace(-2500, 2500, 250);
theta1_vals = linspace(-50, 50, 250);
for i = 1:length(theta0_vals)
	  for j = 1:length(theta1_vals)
		theta_val_vec = [theta0_vals(i) theta1_vals(j)]';
		h_theta = (x*theta_val_vec);
		j_theta(i,j) = 1/(2*m)*sum((h_theta - y).^2);
    end
end
figure;
contour(theta0_vals,theta1_vals,10*log10(j_theta.'))
xlabel('theta_0'); ylabel('theta_1')
title('Cost function J(theta)');
hold on;
plot(theta_stoch_vec_v(:,1),theta_stoch_vec_v(:,2),'rs.');
plot(theta_batch_vec_v(:,1),theta_batch_vec_v(:,2),'kx.');</pre>
<p>The converged values are :</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\begin{array}{llll}\theta_{batch}&amp;=\[1826.189 &amp; -39.392\]\\ \theta_{stochastic}&amp;=\[1806.715 &amp; -35.342\]\\ \end{array}" alt="" align="absmiddle" border="0" />.</p>
<p>&nbsp;</p>
<p><img class="aligncenter size-full wp-image-1001" title="Measured and predicted pageviews Batch and Stochastic gradient descent" src="http://www.dsplog.com/db-install/wp-content/uploads/2011/11/measured_predicted_pageviews_batch_stochastic_gradient_descent.png" alt="Measured and predicted pageviews Batch and Stochastic gradient descent" width="448" height="336" /></p>
<p>From the below plot, we can see that Batch Gradient Descent (black line) goes straight down to the minima, whereas Stochastic Gradient Descent (red line) keeps hovering around (thick red line) before going down to the minima.</p>
<p><img class="aligncenter size-full wp-image-1002" title="Convergence Batch Stochastic Gradient Descent" src="http://www.dsplog.com/db-install/wp-content/uploads/2011/11/convergence_batch_stochastic_gradient_descent.png" alt="" width="448" height="336" /></p>
<h2>References</h2>
<p><a href="http://academicearth.org/lectures/supervised-learning-autonomous-deriving">An Application of Supervised Learning &#8211; Autonomous Deriving (Video Lecture, Class2)</a></p>
<p><a title="CS 229 Machine Learning Course Materials" href="http://cs229.stanford.edu/materials.html" target="_blank">CS 229 Machine Learning Course Materials</a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>Related posts:<ol>
<li><a href='http://www.dsplog.com/2012/01/15/least-squares-gaussian-noise-maximum-likelihood/' rel='bookmark' title='Least Squares in Gaussian Noise &#8211; Maximum Likelihood'>Least Squares in Gaussian Noise &#8211; Maximum Likelihood</a></li>
<li><a href='http://www.dsplog.com/2011/10/29/batch-gradient-descent/' rel='bookmark' title='Batch Gradient Descent'>Batch Gradient Descent</a></li>
<li><a href='http://www.dsplog.com/2011/12/04/closed-form-solution-linear-regression/' rel='bookmark' title='Closed form solution for linear regression'>Closed form solution for linear regression</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.dsplog.com/2011/11/15/stochastic-gradient-descent/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Batch Gradient Descent</title>
		<link>http://www.dsplog.com/2011/10/29/batch-gradient-descent/</link>
		<comments>http://www.dsplog.com/2011/10/29/batch-gradient-descent/#comments</comments>
		<pubDate>Sat, 29 Oct 2011 06:41:27 +0000</pubDate>
		<dc:creator>Krishna Sankar</dc:creator>
				<category><![CDATA[DSP]]></category>
		<category><![CDATA[machine_learning]]></category>

		<guid isPermaLink="false">http://www.dsplog.com/?p=944</guid>
		<description><![CDATA[I happened to stumble on Prof. Andrew Ng&#8217;s Machine Learning classes which are available online as part of Stanford Center for Professional Development. The first lecture in the series discuss the topic of fitting parameters for a given data set using linear regression.  For understanding this concept, I chose to take data from the top [...]
Related posts:<ol>
<li><a href='http://www.dsplog.com/2011/11/15/stochastic-gradient-descent/' rel='bookmark' title='Stochastic Gradient Descent'>Stochastic Gradient Descent</a></li>
<li><a href='http://www.dsplog.com/2011/12/04/closed-form-solution-linear-regression/' rel='bookmark' title='Closed form solution for linear regression'>Closed form solution for linear regression</a></li>
<li><a href='http://www.dsplog.com/2012/01/15/least-squares-gaussian-noise-maximum-likelihood/' rel='bookmark' title='Least Squares in Gaussian Noise &#8211; Maximum Likelihood'>Least Squares in Gaussian Noise &#8211; Maximum Likelihood</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p></p><p>I happened to stumble on Prof. Andrew Ng&#8217;s <a title="Machine Learning - Stanford Computer Science Lecture series" href="http://academicearth.org/lectures/applications-of-machine-learning" target="_blank">Machine Learning classes which are available online</a> as part of Stanford Center for Professional Development. The <a title="Applications of machine learning" href="http://academicearth.org/lectures/supervised-learning-autonomous-deriving" target="_blank">first lecture in the series</a> discuss the topic of fitting parameters for a given data set using linear regression.  For understanding this concept, I chose to take data from the <a title="Top 50 articles of dsplog.com (Sep 2011)" href="http://www.dsplog.com/2011/10/22/back/" target="_blank">top 50 articles of this blog</a> based on the pageviews in the month of September 2011.</p>
<p><span id="more-944"></span></p>
<h2><strong>Notations</strong></h2>
<p>Let</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?m" alt="" align="absmiddle" border="0" /> be the number of training set (in our case top 50 articles),</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x" alt="" align="absmiddle" border="0" /> be the input sequence (the page index),</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?y" alt="" align="absmiddle" border="0" /> be the output sequence (the page views for each page index)</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?n" alt="" align="absmiddle" border="0" /> be the number of features/parameters (=2 for our example).</p>
<p>The value of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?(x^j,y^j)" alt="" align="absmiddle" border="0" /> corresponds to the <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?j^{th}" alt="" align="absmiddle" border="0" /> training set</p>
<p>Let us try to predict the number of page views for a given page index using a hypothesis, where <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?h_%7B%5Ctheta%7D%28x%29" alt="" align="absmiddle" border="0" /> is defined as :</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Cbegin%7Barray%7D%7Blll%7Dh_%7B%5Ctheta%7D%28x%29&amp;=&amp;%5Ctheta%7B_0%7Dx_0%20+%20%5Ctheta%7B_1%7Dx_1%5C%5C&amp;=&amp;%5Csum_%7Bi=0%7D%5E%7Bn-1%7D%5Ctheta%7B_i%7Dx_i%5C%5C&amp;=&amp;%5Ctheta%5ETx%5C%5C%5Cend%7Barray%7D" alt="" align="absmiddle" border="0" /></p>
<p>where,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x_1" alt="" align="absmiddle" border="0" /> is the page index,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?x_0\ = \ 1" alt="" align="absmiddle" border="0" />.</p>
<p>&nbsp;</p>
<h2>Linear regression using gradient descent</h2>
<p>Given the above hypothesis, let us try to figure out the parameter <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Ctheta" alt="" align="absmiddle" border="0" /> which minimizes the square of the error between the predicted value <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?h_%7B%5Ctheta%7D%28x%29" alt="" align="absmiddle" border="0" /> and the actual output <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?y" alt="" align="absmiddle" border="0" /> for all <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?j" alt="" align="absmiddle" border="0" />values in the training set i.e.</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\min_{\theta} \sum_{j=1}^m\[h_{\theta}(x^j) - y^j\]^2" alt="" align="absmiddle" border="0" /></p>
<p>Let us define the cost function <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?J\(\theta\)" alt="" align="absmiddle" border="0" /> as,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?J\(\theta\)=\frac{1}{2}\sum_{j=1}^m\[h_{\theta}(x^j) - y^j\]^2" alt="" align="absmiddle" border="0" />.</p>
<p>The scaling by fraction <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\frac{1}{2}" alt="" align="absmiddle" border="0" /> is just for notational convenience.</p>
<p>Let us start with some parameter vector <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\theta= [0\ 0]^T" alt="" align="absmiddle" border="0" />, and keep changing the <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\theta" alt="" align="absmiddle" border="0" /> to reduce the cost function <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?J(\theta)" alt="" align="absmiddle" border="0" />, i.e.</p>
<p align="absmiddle"><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\begin{array}{lll}\theta_i&amp;:=&amp;\theta_i - \alpha \frac{\partial}{\partial\theta_i}J(\theta)\\&amp;=&amp;\theta_i - \alpha\frac{\partial}{\partial\theta_i} \frac{1}{2}\sum_{j=1}^m\[h_{\theta}(x^j) - y^j\]^2\\&amp;=&amp;\theta_i - \alpha\sum_{j=1}^m\[h_{\theta}(x^j) - y^j\]x_i^j\end{array}" alt="" align="absmiddle" border="0" />.</p>
<p align="absmiddle">The parameter vector <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Ctheta" alt="" align="absmiddle" border="0" /> after algorithm convergence can be used for prediction.</p>
<p align="absmiddle"><strong>Note :</strong></p>
<p align="absmiddle">1. For each update of the parameter vector <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Ctheta" alt="" align="absmiddle" border="0" />, the algorithm process the full training set. This algorithm is called Batch Gradient Descent.</p>
<p align="absmiddle">2. For the given example with 50 training sets, the going over the full training set is computationally feasible. However when the training set is very large, we need to use a slight variant of this scheme, called Stochastic Gradient Descent. We will discuss that in another post.</p>
<p align="absmiddle">3. The proof of the derivation of  <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Ctheta_i" alt="" align="absmiddle" border="0" /> involving differential with <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\frac{\partial}{\partial \theta_i}J%28%5Ctheta%29" alt="" align="absmiddle" border="0" /> will be of interest. We will discuss that in another post.</p>
<h2 align="absmiddle">Matlab/Octave code snippet</h2>
<pre class="html">clear ;
close all;
x = [1:50].';
y = [4554 3014 2171 1891 1593 1532 1416 1326 1297 1266 ...
	1248 1052 951 936 918 797 743 665 662 652 ...
	629 609 596 590 582 547 486 471 462 435 ...
	424 403 400 386 386 384 384 383 370 365 ...
	360 358 354 347 320 319 318 311 307 290 ].';

m = length(y); % store the number of training examples
x = [ ones(m,1) x]; % Add a column of ones to x
n = size(x,2); % number of features
theta_vec = [0 0]';
alpha = 0.002;
err = [0 0]';
for kk = 1:10000
	h_theta = (x*theta_vec);
	h_theta_v = h_theta*ones(1,n);
	y_v = y*ones(1,n);
	theta_vec = theta_vec - alpha*1/m*sum((h_theta_v - y_v).*x).';
	err(:,kk) = 1/m*sum((h_theta_v - y_v).*x).';
end

figure;
plot(x(:,2),y,'bs-');
hold on
plot(x(:,2),x*theta_vec,'rp-');
legend('measured', 'predicted');
grid on;
xlabel('Page index, x');
ylabel('Page views, y');
title('Measured and predicted page views');</pre>
<p>The computed <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\theta" alt="" align="absmiddle" border="0" /> values are</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\begin{array}{lll}\theta_0&amp;=&amp;1826.189\\\theta_1&amp;=&amp;-39.392\end{array}" alt="" align="absmiddle" border="0" />.</p>
<p>With this hypotheses, the predicted page views is shown in the red curve (in the below plot).</p>
<p>In matlab code snippet, kept the number of step of gradient descent blindly as 10000. One can probably stop the gradient descent when the cost function <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?J%28%5Ctheta%29" alt="" align="absmiddle" border="0" /> is small and/or when rate of change of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?J%28%5Ctheta%29" alt="" align="absmiddle" border="0" /> is small.</p>
<p><img class="aligncenter size-full wp-image-977" title="Measured and predicted  pageviews per article sep2011 dsplog.com" src="http://www.dsplog.com/db-install/wp-content/uploads/2011/10/measured_predicted_pageviews_per_article_sep2011_dsplog.png" alt="" width="448" height="336" /></p>
<p>&nbsp;</p>
<p><strong>Couple of things to note :</strong></p>
<p>1. Given that the measured values are showing an exponential trend, trying to fit a straight line does not seem like a good idea. Anyhow, given this is the first post in this series, I let it pass. <img src='http://www.dsplog.com/db-install/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>2. The value of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\alpha" alt="" align="absmiddle" border="0" /> controls the rate of convergence of the algorithm. If <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\alpha" alt="" align="absmiddle" border="0" /> is very small, the algorithm takes small steps and takes longer time to converge. Higher value of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\alpha" alt="" align="absmiddle" border="0" /> causes the algorithm to take large steps, and may cause algorithm to diverge.</p>
<p>3. Have not figured how to select <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\alpha" alt="" align="absmiddle" border="0" /> value suitable (fast convergence) for the data set under consideration. Will figure that out later.</p>
<p><strong>Plotting the variation of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?J%28%5Ctheta%29" alt="" align="absmiddle" border="0" /> for different values of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Ctheta" alt="" align="absmiddle" border="0" /></strong></p>
<p>&nbsp;</p>
<pre class="html">clear;
j_theta = zeros(250, 250);   % initialize j_theta
theta0_vals = linspace(-5000, 5000, 250);
theta1_vals = linspace(-200, 200, 250);
for i = 1:length(theta0_vals)
	  for j = 1:length(theta1_vals)
		theta_val_vec = [theta0_vals(i) theta1_vals(j)]';
		h_theta = (x*theta_val_vec);
		j_theta(i,j) = 1/(2*m)*sum((h_theta - y).^2);
    end
end
figure;
surf(theta0_vals, theta1_vals,10*log10(j_theta.'));
xlabel('theta_0'); ylabel('theta_1');zlabel('10*log10(Jtheta)');
title('Cost function J(theta)');
figure;
contour(theta0_vals,theta1_vals,10*log10(j_theta.'))
xlabel('theta_0'); ylabel('theta_1')
title('Cost function J(theta)');</pre>
<p><img class="aligncenter size-full wp-image-981" title="surf_plot_cost_function" src="http://www.dsplog.com/db-install/wp-content/uploads/2011/10/surf_plot_cost_function.png" alt="" width="448" height="336" /></p>
<p>Given that the surface() plot is bit unwieldy in my relatively slow desktop, using contour() plot seems to be a much better choice. Can see that the minima of this cost function lies near the computed <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Ctheta" alt="" align="absmiddle" border="0" /> values of</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Cbegin%7Barray%7D%7Blll%7D%5Ctheta_0&amp;=&amp;1826.189%5C%5C%5Ctheta_1&amp;=&amp;-39.392%5Cend%7Barray%7D" alt="" align="absmiddle" border="0" />.</p>
<p>&nbsp;</p>
<p><img class="aligncenter size-full wp-image-982" title="contour_plot_cost_function" src="http://www.dsplog.com/db-install/wp-content/uploads/2011/10/contour_plot_cost_function.png" alt="" width="448" height="336" /></p>
<p>&nbsp;</p>
<p><strong>References</strong></p>
<p><a href="http://academicearth.org/lectures/supervised-learning-autonomous-deriving">An Application of Supervised Learning &#8211; Autonomous Deriving</a></p>
<p>&nbsp;</p>
<p>Related posts:<ol>
<li><a href='http://www.dsplog.com/2011/11/15/stochastic-gradient-descent/' rel='bookmark' title='Stochastic Gradient Descent'>Stochastic Gradient Descent</a></li>
<li><a href='http://www.dsplog.com/2011/12/04/closed-form-solution-linear-regression/' rel='bookmark' title='Closed form solution for linear regression'>Closed form solution for linear regression</a></li>
<li><a href='http://www.dsplog.com/2012/01/15/least-squares-gaussian-noise-maximum-likelihood/' rel='bookmark' title='Least Squares in Gaussian Noise &#8211; Maximum Likelihood'>Least Squares in Gaussian Noise &#8211; Maximum Likelihood</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.dsplog.com/2011/10/29/batch-gradient-descent/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Approximate Vector Magnitude Computation</title>
		<link>http://www.dsplog.com/2009/02/08/approximate-vector-magnitude-computation/</link>
		<comments>http://www.dsplog.com/2009/02/08/approximate-vector-magnitude-computation/#comments</comments>
		<pubDate>Sun, 08 Feb 2009 01:43:35 +0000</pubDate>
		<dc:creator>Krishna Sankar</dc:creator>
				<category><![CDATA[DSP]]></category>
		<category><![CDATA[magnitude]]></category>

		<guid isPermaLink="false">http://www.dsplog.com/?p=468</guid>
		<description><![CDATA[In this post, let us discuss a simple implementation friendly scheme for computing the absolute value of a complex number . The technique called (alpha Max + beta Min) algorithm is discussed in Chapter 13.2 of Understanding Digital Signal Processing, Richard Lyons and is also available online at Digital Signal Processing Tricks &#8211; High-speed vector [...]
Related posts:<ol>
<li><a href='http://www.dsplog.com/2007/12/16/using-cordic-for-phase-and-magnitude-computation/' rel='bookmark' title='Using CORDIC for phase and magnitude computation'>Using CORDIC for phase and magnitude computation</a></li>
<li><a href='http://www.dsplog.com/2008/08/10/ber-bpsk-rayleigh-channel/' rel='bookmark' title='BER for BPSK in Rayleigh channel'>BER for BPSK in Rayleigh channel</a></li>
<li><a href='http://www.dsplog.com/2007/08/19/cordic-for-phase-rotation/' rel='bookmark' title='CORDIC for phase rotation'>CORDIC for phase rotation</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p></p><p>In this post, let us discuss a simple implementation friendly scheme for computing the absolute value of a complex number <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?Z = X+jY" border="0" alt="" align="absmiddle" />. The technique called <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\alpha Max + \beta Min" border="0" alt="" align="absmiddle" />(alpha Max + beta Min) algorithm is discussed in Chapter 13.2 of <a href="http://www.amazon.com/gp/redirect.html?ie=UTF8&amp;location=http%3A%2F%2Fwww.amazon.com%2FUnderstanding-Digital-Signal-Processing-Richard%2Fdp%2F0201634678&amp;tag=dl04-20&amp;linkCode=ur2&amp;camp=1789&amp;creative=9325">Understanding Digital Signal Processing, Richard Lyons</a><img style="border: medium none  ! important; margin: 0px ! important;" src="https://www.assoc-amazon.com/e/ir?t=dl04-20&amp;l=ur2&amp;o=1" border="0" alt="" width="1" height="1" /> and is also available online at <a href="http://www.embedded.com/design/embeddeddsp/202600924?_requestid=7638">Digital Signal Processing Tricks &#8211; High-speed vector magnitude approximation </a></p>
<p>The magnitude of a complex number <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?Z = X+jY" border="0" alt="" align="absmiddle" />is</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?|Z| = \sqrt{X^2 + Y^2}" border="0" alt="" align="absmiddle" />.</p>
<p>The simplified computation of the absolute value is</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?|Z| \approx \alpha Max + \beta Min" border="0" alt="" align="absmiddle" /> where</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?Max = \max \left( |X|, |Y| \right)" border="0" alt="" align="absmiddle" /></p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?Min = \min \left( |X|, |Y| \right)" border="0" alt="" align="absmiddle" />.</p>
<p><span id="more-468"></span>The values of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\alpha" border="0" alt="" align="absmiddle" /> and <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\beta" border="0" alt="" align="absmiddle" /> can be tried out to understand the performance. For analysis we can use a complex number with magnitude 1 and phase from 0 to 180 degrees.</p>
<p><strong>Option#1 </strong></p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\alpha" border="0" alt="" align="absmiddle" /> = 1, <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\beta" border="0" alt="" align="absmiddle" /> = 1/2,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?|Z| \approx  Max + \frac{Min}{2}" border="0" alt="" align="absmiddle" /></p>
<p><strong>Option#2 </strong></p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\alpha" border="0" alt="" align="absmiddle" /> = 1, <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\beta" border="0" alt="" align="absmiddle" /> = 1/4,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?|Z| \approx  Max + \frac{Min}{4}" border="0" alt="" align="absmiddle" /></p>
<p><strong>Option#3 </strong></p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\alpha" border="0" alt="" align="absmiddle" /> = 1, <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\beta" border="0" alt="" align="absmiddle" /> = 3/8</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?|Z| \approx  Max + \frac{3Min}{8}" border="0" alt="" align="absmiddle" /></p>
<p><strong>Option#4 </strong></p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\alpha" border="0" alt="" align="absmiddle" /> = 7/8, <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\beta" border="0" alt="" align="absmiddle" /> = 7/16</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?|Z| \approx  \frac{7Max}{8} + \frac{7Min}{16}" border="0" alt="" align="absmiddle" /></p>
<p><strong>Option#5 </strong></p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\alpha" border="0" alt="" align="absmiddle" /> = 15/16, <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\beta" border="0" alt="" align="absmiddle" /> = 15/32</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?|Z| \approx  \frac{15Max}{16} + \frac{15Min}{32}" border="0" alt="" align="absmiddle" /></p>
<h2>Simulation Model</h2>
<p>The script performs the following.</p>
<p>(a) Generate a complex number with phase varying from 0 to 180 degrees.</p>
<p>(b) Find the absolute value using the above 5 options</p>
<p>(c) For each option, find the maximum error, average error and root mean square error</p>
<p>Click here to download <a href="http://www.dsplog.com/db-install/wp-content/uploads/2009/02/script_approximate_vector_magnitude_computation.m">Matlab/Octave script for computing the approximate value of magnitude of a complex number</a></p>
<p><img class="alignnone" title="Plot of approximate value of magnitude of a complex number" src="http://www.dsplog.com/db-install/wp-content/uploads/2009/02/plot_approximate_vector_magnitude_computation.png" alt="" width="632" height="399" /></p>
<p><strong>Figure: Plot of approximate value of magnitude of a complex number</strong></p>
<p><strong></strong></p>
<table style="text-align: center;" border="0" cellspacing="0" cellpadding="0" width="497">
<col width="64"></col>
<col width="73"></col>
<col width="76"></col>
<col width="90"></col>
<col width="93"></col>
<col width="101"></col>
<tbody>
<tr height="63">
<td width="64" height="63">Option</td>
<td width="73">alpha</td>
<td width="76">beta</td>
<td width="90">Maximum Error %</td>
<td width="93">Average Error %</td>
<td width="101">RMS<br />
error %</td>
</tr>
<tr height="20">
<td height="20">1</td>
<td width="73">1</td>
<td width="76">1/2</td>
<td width="90">11.80340</td>
<td width="93">8.67667</td>
<td width="101">9.21159</td>
</tr>
<tr height="20">
<td height="20">2</td>
<td width="73">1</td>
<td width="76">1/4</td>
<td width="90">-11.60134</td>
<td width="93">-0.64520</td>
<td width="101">4.15450</td>
</tr>
<tr height="20">
<td height="20">3</td>
<td width="73">1</td>
<td width="76">3/8</td>
<td width="90">6.80005</td>
<td width="93">4.01573</td>
<td width="101">4.76143</td>
</tr>
<tr height="20">
<td height="20">4</td>
<td width="73">7/8</td>
<td width="76">7/16</td>
<td width="90">-12.50000</td>
<td width="93">-4.90792</td>
<td width="101">5.60480</td>
</tr>
<tr height="20">
<td height="20">5</td>
<td width="73">15/16</td>
<td width="76">15/32</td>
<td width="90">-6.25000</td>
<td width="93">1.88438</td>
<td width="101">3.45847</td>
</tr>
</tbody>
</table>
<p><strong></strong></p>
<p><strong>Table: Error in the approximate value computation with various values of </strong><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\alpha" border="0" alt="" align="absmiddle" />, <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\beta" border="0" alt="" align="absmiddle" /></p>
<h2>Observations</h2>
<p>1. The chosen values of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\alpha" border="0" alt="" align="absmiddle" />, <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\beta" border="0" alt="" align="absmiddle" />facilitates simple multiplier-less implementation of approximate computation (can be implemented using only bit shift and addition).</p>
<p>2. For Options (1),  (3) the maximum error is more than the expected value. Hence we need to allocate extra bits for the output to prevent overflow.</p>
<p>3. The error in the approximate magnitude computation repeats every 90 degrees.</p>
<h2>Reference</h2>
<p>Chapter 13.2 of <a href="http://www.amazon.com/gp/redirect.html?ie=UTF8&amp;location=http%3A%2F%2Fwww.amazon.com%2FUnderstanding-Digital-Signal-Processing-Richard%2Fdp%2F0201634678&amp;tag=dl04-20&amp;linkCode=ur2&amp;camp=1789&amp;creative=9325">Understanding Digital Signal Processing, Richard Lyons</a><img style="border: medium none  ! important; margin: 0px ! important;" src="https://www.assoc-amazon.com/e/ir?t=dl04-20&amp;l=ur2&amp;o=1" border="0" alt="" width="1" height="1" /></p>
<p>Related posts:<ol>
<li><a href='http://www.dsplog.com/2007/12/16/using-cordic-for-phase-and-magnitude-computation/' rel='bookmark' title='Using CORDIC for phase and magnitude computation'>Using CORDIC for phase and magnitude computation</a></li>
<li><a href='http://www.dsplog.com/2008/08/10/ber-bpsk-rayleigh-channel/' rel='bookmark' title='BER for BPSK in Rayleigh channel'>BER for BPSK in Rayleigh channel</a></li>
<li><a href='http://www.dsplog.com/2007/08/19/cordic-for-phase-rotation/' rel='bookmark' title='CORDIC for phase rotation'>CORDIC for phase rotation</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.dsplog.com/2009/02/08/approximate-vector-magnitude-computation/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Using CORDIC for phase and magnitude computation</title>
		<link>http://www.dsplog.com/2007/12/16/using-cordic-for-phase-and-magnitude-computation/</link>
		<comments>http://www.dsplog.com/2007/12/16/using-cordic-for-phase-and-magnitude-computation/#comments</comments>
		<pubDate>Sun, 16 Dec 2007 09:22:17 +0000</pubDate>
		<dc:creator>Krishna Sankar</dc:creator>
				<category><![CDATA[DSP]]></category>
		<category><![CDATA[CORDIC]]></category>
		<category><![CDATA[magnitude]]></category>
		<category><![CDATA[phase]]></category>

		<guid isPermaLink="false">http://www.dsplog.com/2007/12/16/using-cordic-for-phase-and-magnitude-computation/</guid>
		<description><![CDATA[In a previous post (here), we looked at using CORDIC (Co-ordinate Rotation by DIgital Computer) for understanding how a complex number can be rotated by an angle without using actual multipliers. Let us know try to understand how we can use CORDIC for finding the phase and magnitude of a complex number. Basics The CORDIC [...]
Related posts:<ol>
<li><a href='http://www.dsplog.com/2007/08/19/cordic-for-phase-rotation/' rel='bookmark' title='CORDIC for phase rotation'>CORDIC for phase rotation</a></li>
<li><a href='http://www.dsplog.com/2009/02/08/approximate-vector-magnitude-computation/' rel='bookmark' title='Approximate Vector Magnitude Computation'>Approximate Vector Magnitude Computation</a></li>
<li><a href='http://www.dsplog.com/2007/06/10/first-order-digital-pll-for-tracking-constant-phase-offset/' rel='bookmark' title='First order digital PLL for tracking constant phase offset'>First order digital PLL for tracking constant phase offset</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p></p><p>In a previous post (<a href="http://www.dsplog.com/2007/08/19/cordic-for-phase-rotation/">here</a>), we looked at using CORDIC (Co-ordinate Rotation by DIgital Computer) for understanding how a complex number <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{X}" border="0" alt="" align="absmiddle" /> can be rotated by an angle <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\theta" border="0" alt="" align="absmiddle" /> without using actual multipliers. Let us know try to understand how we can use CORDIC for finding the phase and magnitude of a complex number.</p>
<p><strong><span style="text-decoration: underline;">Basics</span></strong></p>
<p>The CORDIC algorithm is built on successively multiplying the complex number <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{X}" border="0" alt="" align="absmiddle" />, by <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Cmathbf%7BY%7D%20=1%20+%20j2%5E%7B-k%7D" border="0" alt="" align="absmiddle" />. As can be noticed, as the elements of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{Y}" border="0" alt="" align="absmiddle" /> can be represented in powers of 2, the multiplication can be achieved by using the appropriate &#8216;bit shift&#8217;.  For further details, please refer to the previous post (<a href="http://www.dsplog.com/2007/08/19/cordic-for-phase-rotation/http://www.dsplog.com/2007/08/19/cordic-for-phase-rotation/http://www.dsplog.com/2007/08/19/cordic-for-phase-rotation/">CORDIC for phase rotation</a>).</p>
<p><span id="more-29"></span></p>
<p><strong><span style="text-decoration: underline;">Finding the magnitude and phase</span></strong></p>
<p>It is reasonably obvious that the multiplying a complex number <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{X}" border="0" alt="" align="absmiddle" /> by <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?e^{j\theta}" border="0" alt="" align="absmiddle" /> does not change the magnitude of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{X}" border="0" alt="" align="absmiddle" />.</p>
<p>Given so, if phase rotation of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{X}" border="0" alt="" align="absmiddle" /> results in <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{Z}" border="0" alt="" align="absmiddle" />, and the imaginary component of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{Z}" border="0" alt="" align="absmiddle" /> is 0, then the real part of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{Z}" border="0" alt="" align="absmiddle" /> stores the magnitude of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{X}" border="0" alt="" align="absmiddle" />.</p>
<p>To put in equations, if</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{Z}=\mathbf{X}e^{j\theta}" border="0" alt="" align="absmiddle" />, where <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5CIm%20%5Cleft%5B%7B%5Cmathbf%7BZ%7D%7D%5Cright%5D=0" border="0" alt="" align="absmiddle" />,</p>
<p>then,</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5CRe%20%5Cleft%5B%7B%5Cmathbf%7BZ%7D%7D%5Cright%5D=%7C%5Cmathbf%7BX%7D%7C" border="0" alt="" align="absmiddle" /> (real part of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{Z}" border="0" alt="" align="absmiddle" /> is the magnitude of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{X}" border="0" alt="" align="absmiddle" />)</p>
<p><img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?%5Ctheta%20=%20-%5Cangle%5Cleft%5B%7BX%7D%5Cright%5D" border="0" alt="" align="absmiddle" /> (the rotation angle <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\theta" border="0" alt="" align="absmiddle" /> is the negative of the phase of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{X}" border="0" alt="" align="absmiddle" />)</p>
<p>This is the fundamental idea behind finding the magnitude and phase of a complex number using CORDIC.</p>
<p>The sequence of events is as shown below:</p>
<p>(a) The input complex number is subject to a series of phase rotations.</p>
<p>(b) The sign of the phase rotation is the negative of the sign of imaginary component of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{Z}" border="0" alt="" align="absmiddle" />.</p>
<p>(c) After multiple iterations, imaginary component of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{Z}" border="0" alt="" align="absmiddle" /> tends to zero.</p>
<p>(d) <span style="text-decoration: underline;">Then, the real part of the new complex vector represents the magnitude and the cumulative phase value represents the negative of the phase of <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\mathbf{X}" border="0" alt="" align="absmiddle" />.</span></p>
<p><img src="http://www.dsplog.com/db-install/wp-content/uploads/2008/04/cordic_for_phase_magnitude.jpg" alt="Flowchart of the operations when CORDIC is used for phase and magnitude computation" width="375" height="500" /></p>
<p><strong>Figure: Flow chart for the operations involved in using CORDIC for computing phase and magnitude</strong></p>
<p>The reference phase <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?\alpha_k=\angle\left[1+j2^{-k}\right]" border="0" alt="" align="absmiddle" /> (phase of  <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?1+j2^{-k}" border="0" alt="" align="absmiddle" />).</p>
<p>The scaling factor of 1.64676025786545 is to remove the &#8216;gain&#8217;  following successive rotations by <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?1%20+%20sj2%5E%7B-k%7D" border="0" alt="" align="absmiddle" />. Please look at the previous post on CORDIC (<a href="http://www.dsplog.com/2007/08/19/cordic-for-phase-rotation/">here</a>) for details.</p>
<p><span style="text-decoration: underline;">Note:</span></p>
<p>If the input complex number lies in the second or third quadrant, it needs to be first shifted to the first/fourth quadrant before we start the sequence of events shown in the figure above (as the CORDIC range is limited to around +/-90 degrees). This can be achieved by multiplication by <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?j" border="0" alt="" align="absmiddle" /> or <img src="http://www.dsplog.com/cgi-bin/mimetex.cgi?-j" border="0" alt="" align="absmiddle" />, as appropriate.</p>
<h2>Simulation model</h2>
<p>Simple Matlab/Octave script for computing the phase and magnitude of a complex number using the CORDIC approach. Quick comparison indicate that the computed and acutal values are closely matching.</p>
<p>Click <a title="Script for computing the phase and magnitude of a complex number using the CORDIC" href="http://www.dsplog.com/db-install/wp-content/uploads/2008/04/script_cordic_phase_magnitude.m">here</a> to dowload<br />
<strong><span style="text-decoration: underline;">Reference</span></strong></p>
<p>[DSPGURU-CORDIC] <a href="http://www.dspguru.com/info/faqs/cordic.htm">CORDIC FAQ in dspGuru(TM)</a></p>
<p>Hope this helps.</p>
<p>Krishna</p>
<p>Related posts:<ol>
<li><a href='http://www.dsplog.com/2007/08/19/cordic-for-phase-rotation/' rel='bookmark' title='CORDIC for phase rotation'>CORDIC for phase rotation</a></li>
<li><a href='http://www.dsplog.com/2009/02/08/approximate-vector-magnitude-computation/' rel='bookmark' title='Approximate Vector Magnitude Computation'>Approximate Vector Magnitude Computation</a></li>
<li><a href='http://www.dsplog.com/2007/06/10/first-order-digital-pll-for-tracking-constant-phase-offset/' rel='bookmark' title='First order digital PLL for tracking constant phase offset'>First order digital PLL for tracking constant phase offset</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.dsplog.com/2007/12/16/using-cordic-for-phase-and-magnitude-computation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Served from: www.dsplog.com @ 2012-02-05 02:53:21 -->
