DSP – DSP LOG

Modeling phase noise (frequency domain approach)

Krishna Sankar — Sun, 30 Sep 2012 11:57:30 +0000

In typical wireless system simulations, there is a need to model the phase noise profile of the local oscillator. For eg, the phase noise profile of the oscillator can be of the shape described in the post on Phase Noise Power Spectral Density to Jitter. While looking around for example Matlab code, found two references [1, 2] which uses the approach of defining the phase noise profile in frequency domain, and then using ifft() to convert to the time domain samples. This post gives a brief overview of the modeling and provides an example Matlab/Octave code.

Modeling

a) Assume a system with sampling frequency Hz and having samples. In frequency domain we can define in steps of Hz.

b) Consider a phase noise profile defined as follows :

freq	PSD, dBc/Hz
0	-65
1kHz	-65
10kHz	-95
100kHz	-115
1MHz	-125
10MHz	-125

Table : Example phase noise profile

From the phase noise profile, using linear interpolation (in log10 of the frequency axis), to find the phase noise power spectral density for frequencies from in steps of Hz.

c) Generate a white Gaussian noise sample of length and scale it with the phase noise power spectral density

(Scaling by is to normalize the resolution bandwidth to unity)

d) Use ifft() to find the time domain samples.

e) On the rel samples obtained from step (d), take to form the time domain phase noise samples.

Note :

When x is small, .

Example Matlab script

% Script for simulating the an example phase noise profile 
% ----------------------------------------------------------

clear all; close all;
fs_Hz    = 20e6;
N        = 10^5;
nIter    = 100;

% phase noise profile
psd_f_hz           = [  0  1e3  1e4  1e5  1e6  10e6];
psd_val_dbc_per_hz = [-65  -65  -95 -115 -125  -125];

% defining the frequency vector
freq_v_Hz = [0:N/2]/N*fs_Hz;
delta_f   = fs_Hz/N;
slope    = [psd_val_dbc_per_hz(2:end) - psd_val_dbc_per_hz(1:end-1) ]./...
		(log10(psd_f_hz(2:end)) - log10(psd_f_hz(1:end-1)));
constant = 10.^(psd_val_dbc_per_hz(1:end-1)/10).* ...
		(psd_f_hz(1:end-1).^(-slope/10));
integral = constant.*(psd_f_hz(2:end).^(1+slope/10) - ...
			psd_f_hz(1:end-1).^(1+slope/10) )./(1+slope/10);
%% finding the rms jitter
% finding index with slope == -10
idx = find(slope==-10);
integral(idx) = constant(idx).*(log(psd_f_hz(idx+1)) - log(psd_f_hz(idx)));
rms_jitter_radians  = sqrt(2*integral);
integrated_jitter_radians = sqrt(2*sum(integral))

% interpolating the phase noise psd values
psd_ssb_dB = -Inf*ones(1,N/2+1); % from [0:N/2]
for ii=1:length(psd_f_hz)-1
   [tt1 fl_idx ] = (min(abs(psd_f_hz(ii) - freq_v_Hz)));
   [tt2 fr_idx ] = (min(abs(psd_f_hz(ii+1) - freq_v_Hz)));	
   fvec = [freq_v_Hz(fl_idx):delta_f:freq_v_Hz(fr_idx)];
   pvec = slope(ii)*log10(fvec+eps) + psd_val_dbc_per_hz(ii) - slope(ii)*log10(psd_f_hz(ii)+eps);
   psd_ssb_dB(fl_idx:fr_idx) = pvec;
end

% forming the full vector from [-N/2:-1 0:N/2-1 ]/N*fs_Hz
psd_dB                  = -Inf*ones(1,N);
psd_dB([-N/2:-1]+N/2+1) = psd_ssb_dB([N/2+1:-1:2]);
psd_dB([0:N/2-1]+N/2+1) = psd_ssb_dB(1:N/2);

psd_linear = 10.^(psd_dB/20);

for (jj = 1:nIter)
   % defining frequency vector
   phase_noise_freq       = 1/sqrt(2)*(randn(1,N) + j*randn(1,N));
   phase_noise_freq_scale = N*sqrt(delta_f)*phase_noise_freq;
   phase_noise_freq_psd   = phase_noise_freq_scale .*psd_linear;

   % converting to time domain
   phase_noise_td      = ifft(fftshift(phase_noise_freq_psd));
   pn_td               = exp(j*(sqrt(2)*real(phase_noise_td)));

   % for estimating jitter and plotting 
   pn_without_carrier  = (pn_td - 1);
   est_jitter_pwr_radians(jj) = mean(pn_without_carrier.*conj(pn_without_carrier));

   hF           = 1/(N*sqrt(delta_f))*fft(pn_without_carrier,N);
   hFPwr(jj,:)  = hF.*conj(hF);
end

est_integrated_jitter_radians = sqrt(mean(est_jitter_pwr_radians));
title_str = sprintf('Phase noise profile, est jitter %2.5f radians (expected %2.5f radians)', est_integrated_jitter_radians, integrated_jitter_radians);

figure
semilogx( [-N/2:N/2-1]/N*fs_Hz, 10*log10(fftshift(mean(hFPwr))), 'r^-' );
hold on;grid on;
semilogx([0:N/2]/N*fs_Hz,psd_ssb_dB,'mp-');
semilogx(psd_f_hz,psd_val_dbc_per_hz,'bs-');
legend('est-freq-response','original','interpolated');
xlabel('freq, Hz'); ylabel('dBc/Hz');
axis([1 10e6 -140 -50]); title(title_str);

Figure : Example phase noise profile (expected and simulated)

Summary

The above approach seems to allow a way to model an arbitrary phase noise power spectral density. However, the fact that this approach needs a large ifft() of length can potentially slow down the simulation.

References

[1] Phase Noise by Alex Bar-Guy, 27 Oct 2005 (Updated 08 Dec 2005) http://www.mathworks.com/matlabcentral/fileexchange/8844-phase-noise

[2] Baseband-equivalent phase noise model, Submitted by Markus Nentwig on Dec 18 2011

http://www.dsprelated.com/showcode/246.php

The post Modeling phase noise (frequency domain approach) appeared first on DSP LOG.

Weighted Least Squares and locally weighted linear regression

Krishna Sankar — Sun, 05 Feb 2012 11:32:57 +0000

From the post on Closed Form Solution for Linear regression, we computed the parameter vector which minimizes the square of the error between the predicted value and the actual output for all values in the training set. In that model all the values in the training set is given equal importance. Let us consider the case where it is known some observations are important than the other. This post attempts to the discuss the case where some observations need to be given more weights than others (also known as weighted least squares).

Notations

Let’s revisit the notations.

be the number of training set (in our case top 50 articles),

be the input sequence (the page index),

be the output sequence (the page views for each page index)

be the number of features/parameters (=2 for our example).

The value of corresponds to the training set.

Let be the weight given to the training set.

The predicted the number of page views for a given page index using a hypothesis defined as :

Goal is to find the parameter vector which minimizes the square of the error between the predicted value and the actual output for all values in the training set with weight i.e.

From matrix algebra, we know that

where,

is the diagonal matrix of dimension [m x m].

is the input sequence of dimension [m x n]

is the measured values of dimension [m x 1]

is the parameter vector of dimension [n x 1].

Defining the cost function as,

To find the value of which minimizes , we can differentiate with respect to , i.e.

To find the value of which minimizes , we set

The weighted least squares solution is,

Local weights using exponential function

As given in Chapter 4 of CS229 Lecture notes1, Probabilistic Interpretation, Prof. Andrew Ng. let us assume a weighting function defined as,

When computing the predicted value for an observation , less weightage is given to observation far away from .

Further an additional parameter, controls the width of the weighting function. Higher the value of , wider the weight function.

Figure: Plot of the exponential weighting function for different values of

Matlab/Octave code snippet

clear ;
close all;
x = [1:50].';
y = [4554 3014 2171 1891 1593 1532 1416 1326 1297 1266 ...
	1248 1052 951 936 918 797 743 665 662 652 ...
	629 609 596 590 582 547 486 471 462 435 ...
	424 403 400 386 386 384 384 383 370 365 ...
	360 358 354 347 320 319 318 311 307 290 ].';

m = length(y); % store the number of training examples
x = [ ones(m,1) x]; % Add a column of ones to x
n = size(x,2); % number of features
theta_vec = inv(x'*x)*x'*y;
tau = [1 10 25 ];
y_est = zeros(length(tau),length(x));
for kk = 1:length(tau)
	for ii = 1:length(x);
		w_ii = exp(-(x(ii,2) - x(:,2)).^2./(2*tau(kk)^2));
		W = diag(w_ii);
		theta_vec = inv(x'*W*x)*x'*W*y;
		y_est(kk, ii) = x(ii,:)*theta_vec;
	end
end

figure;
plot(x(:,2),y,'ks-'); hold on
plot(x(:,2),y_est(1,:),'bp-');
plot(x(:,2),y_est(2,:),'rx-');
plot(x(:,2),y_est(3,:),'go-');
legend('measured', 'predicted, tau=1', 'predicted, tau=10','predicted, tau=25');
grid on;
xlabel('Page index, x');
ylabel('Page views, y');
title('Measured and predicted page views with weighted least squares');

Observations

a) For a smaller value of (=1), the measured and predicted values are almost on top of each other

b) For a higher value of (=25), the predicted value is close to the curve obtained from the no weighting case.

c) When predicting using the locally weighted least squares case, we need to have the training set handy to compute the weighting function. In contrast, for the unweighted case one could have ignored the training set once parameter vector is computed.

References

CS229 Lecture notes1, Chapter 3 Locally weighted linear regression, Prof. Andrew Ng

The post Weighted Least Squares and locally weighted linear regression appeared first on DSP LOG.

Least Squares in Gaussian Noise – Maximum Likelihood

Krishna Sankar — Sun, 15 Jan 2012 04:45:20 +0000

From the previous posts on Linear Regression (using Batch Gradient descent, Stochastic Gradient Descent, Closed form solution), we discussed couple of different ways to estimate the parameter vector in the least square error sense for the given training set. However, how does the least square error criterion work when the training set is corrupted by noise? In this post, let us discuss the case where training set is corrupted by Gaussian noise.

For the training set, the system model is :

where,

is the input sequence,

is the output sequence,

is the parameter vector and

is the noise in the observations.

Let us assume that the noise term are independent and identically distributed following a Gaussian probability having mean 0 and variance .

The probability density function of noise term can be written as,
.

This means that probability of the output sequence given and parameterised by is,

Let us write the likelihood of , given all the observations of input sequence and output as,

Given that all the observations are independent, the likelihood of is,

Taking logarithm on both sides, the log-likelihood function is,

From the above expression, we can see that maximizing the likelihood function is same as minimizing

Recall: This is same cost function which was minimized in the Least Squares solution.

Summarizing:

a) When the observations are corrupted by independent Gaussian Noise, the least squares solution is the Maximum Likelihood estimate of the parameter vector .

b) The term is not a playing a role in this minimization. However if the noise variance of each observation is different, this needs to get factored in. We will discuss this in another post.

References

CS229 Lecture notes1, Chapter 3 Probabilistic Interpretation, Prof. Andrew Ng

The post Least Squares in Gaussian Noise – Maximum Likelihood appeared first on DSP LOG.

Newton’s method to find square root, inverse

Krishna Sankar — Sun, 25 Dec 2011 14:36:35 +0000

Some of us would have used Newton’s method (also known as Newton-Raphson method) in some form or other. The method has quite a bit of history, starting with the Babylonian way of finding the square root and later over centuries reaching the present recursive way of finding the solution. In this post, we will describe Newton’s method and apply it to find the square root and the inverse of a number.

Geometrical interpretation

We know that the derivative of a function at is the slope of the tangent (red line) at i.e.,

Rearranging, the intercept of the tangent at x-axis is,

From the figure above, can see that the tangent (red line) intercepts the x-axis at which is closer to the where compared to . Keep on doing this operation recursively, and it converges to the zero of the function OR in another words the root of the function.

In general for iteration, the equation is :

Finding square root

Let us, for example try to use this method for finding the square root of D=100. The function to zero out in the Newton’s method frame work is,

, where .

The first derivative is

The recursive equation is,

Matlab code snippet

clear ; close all;
D = 100; % number to find the square root
x = 1; % initial value
for ii = 1:10
	fx = x.^2 - D;
	f1x = 2*x;
	x = x - fx/f1x;
	x_v(ii) = x;
end
x_v' =
   50.5000000000000
   26.2400990099010
   15.0255301199868
   10.8404346730269
   10.0325785109606
   10.0000528956427
   10.0000000001399
   10.0000000000000
   10.0000000000000
   10.0000000000000

We can see that the it converges within around 8 iterations. Further, playing around with the initial value,

a) if we start with initial value of x = -1, then we will converge to -10.

b) if we start with initial value of x = 0, then we will not converge

and so on…

Finding inverse (division)

Newton’s method can be used to find the inverse of a variable D. One way to write the function to zero out is, but we soon realize that this does not work as we need know in the first place.

Alternatively the function to zero out can be written as,

The first derivative is,

The equation in the recursive form is,

Matlab code snippet

clear ; close all;
D = .1; % number to find the square root
x = [.1:.2:1]; % initial value
for ii = 1:10
	fx = (1./x) - D;
	f1x = -1./x.^2;
	x = x - fx./f1x;
	x_v(:,ii) = x;
end
plot(x_v');
legend('0.1', '0.3', '0.5', '0.7', '0.9');
grid on; xlabel('number of iterations'); ylabel('inverse');
title('finding inverse newton''s method');

The following plot shows the convergence of inverse computation to the right value for different values of for this example matlab code snippet.

Figure : convergence of inverse computation

Finding the minima of a function

To find the minima of a function, we to find where the derivative of the function becomes zero i.e. .

Using Newton’s method, the recursive equation becomes :

Thoughts

We have briefly gone through the Newton’s method and its applications to find the roots of a function, inverse, minima etc. However, there are quite a few aspects which we did not go over, like :

a) Impact of the initial value on the convergence of the function

b) Rate of the convergence

c) Error bounds of the converged result

d) Conditions where the convergence does not happen

and so on…

Hoping to discuss those in another post…

References

Wiki on Newton’s method

The post Newton’s method to find square root, inverse appeared first on DSP LOG.

Closed form solution for linear regression

Krishna Sankar — Sun, 04 Dec 2011 13:07:51 +0000

In the previous post on Batch Gradient Descent and Stochastic Gradient Descent, we looked at two iterative methods for finding the parameter vector which minimizes the square of the error between the predicted value and the actual output for all values in the training set.

A closed form solution for finding the parameter vector is possible, and in this post let us explore that. Ofcourse, I thank Prof. Andrew Ng for putting all these material available on public domain (Lecture Notes 1).^{retrieved 11th Sep 2024}

Notations

Let’s revisit the notations.

be the number of training set (in our case top 50 articles),

be the input sequence (the page index),

be the output sequence (the page views for each page index)

be the number of features/parameters (=2 for our example).

The value of corresponds to the training set

The predicted the number of page views for a given page index using a hypothesis defined as :

where,

is the page index,

Formulating in matrix notations:

The input sequence is,

of dimension [m x n]

The measured values are,

of dimension [m x 1].

The parameter vector is,

of dimension [n x 1]

The hypothesis term is,

of dimension [m x 1].

From the above,

Recall :

Our goal is to find the parameter vector which minimizes the square of the error between the predicted value and the actual output for all values in the training set i.e.

From matrix algebra, we know that

So we can now go about to define the cost function as,

To find the value of which minimizes , we can differentiate with respect to .

To find the value of which minimizes , we set

Solving,

Note : (Update 7th Dec 2011)

As pointed by Mr. Andre KrishKevich, the above solution is same as the formula for liner least squares fit (linear least squares, least square in wiki)

Matlab/Octave code snippet

clear ;
close all;
x = [1:50].';
y = [4554 3014 2171 1891 1593 1532 1416 1326 1297 1266 ...
	1248 1052 951 936 918 797 743 665 662 652 ...
	629 609 596 590 582 547 486 471 462 435 ...
	424 403 400 386 386 384 384 383 370 365 ...
	360 358 354 347 320 319 318 311 307 290 ].';

m = length(y); % store the number of training examples
x = [ ones(m,1) x]; % Add a column of ones to x
n = size(x,2); % number of features
theta_vec = inv(x'*x)*x'*y;

The computed values are

Note :

(Refer: Matrix calculus notes – University of Colorado)

(Refer : Matrix Calculus Wiki)

References

An Application of Supervised Learning – Autonomous Deriving (Video Lecture, Class2)

CS 229 Machine Learning Course Materials

Refer: Matrix calculus notes – University of Colorado

Matrix Calculus Wiki

The post Closed form solution for linear regression appeared first on DSP LOG.

Stochastic Gradient Descent

Krishna Sankar — Mon, 14 Nov 2011 23:56:00 +0000

For curve fitting using linear regression, there exists a minor variant of Batch Gradient Descent algorithm, called Stochastic Gradient Descent.

In the Batch Gradient Descent, the parameter vector is updated as,

(loop over all elements of training set in one iteration)

For Stochastic Gradient Descent, the vector gets updated as, at each iteration the algorithm goes over only one among training set, i.e.

When the training set is large, Stochastic Gradient Descent can be useful (as we need not go over the full data to get the first set of the parameter vector )

For the same Matlab example used in the previous post, we can see that both batch and stochastic gradient descent converged to reasonably close values.

Matlab/Octave code snippet

clear ;
close all;
x = [1:50].';
y = [4554 3014 2171 1891 1593 1532 1416 1326 1297 1266 ...
	1248 1052 951 936 918 797 743 665 662 652 ...
	629 609 596 590 582 547 486 471 462 435 ...
	424 403 400 386 386 384 384 383 370 365 ...
	360 358 354 347 320 319 318 311 307 290 ].';
%
m = length(y); % store the number of training examples
x = [ ones(m,1) x]; % Add a column of ones to x
n = size(x,2); % number of features
theta_batch_vec = [0 0]';
theta_stoch_vec = [0 0]';
alpha = 0.002;
err = [0 0]';
theta_batch_vec_v = zeros(10000,2);
theta_stoch_vec_v = zeros(50*10000,2);
for kk = 1:10000
	% batch gradient descent - loop over all training set
	h_theta_batch = (x*theta_batch_vec);
	h_theta_batch_v = h_theta_batch*ones(1,n);
	y_v = y*ones(1,n);
	theta_batch_vec = theta_batch_vec - alpha*1/m*sum((h_theta_batch_v - y_v).*x).';
	theta_batch_vec_v(kk,:) = theta_batch_vec;
	%j_theta_batch(kk) = 1/(2*m)*sum((h_theta_batch - y).^2);

	% stochastic gradient descent - loop over one training set at a time
	for (jj = 1:50)
		h_theta_stoch = (x(jj,:)*theta_stoch_vec);
		h_theta_stoch_v = h_theta_stoch*ones(1,n);
		y_v = y(jj,:)*ones(1,n);
		theta_stoch_vec = theta_stoch_vec - alpha*1/m*((h_theta_stoch_v - y_v).*x(jj,:)).';
		%j_theta_stoch(kk,jj) = 1/(2*m)*sum((h_theta_stoch - y).^2);
		theta_stoch_vec_v(50*(kk-1)+jj,:) = theta_stoch_vec;
	end
end

figure;
plot(x(:,2),y,'bs-');
hold on
plot(x(:,2),x*theta_batch_vec,'md-');
plot(x(:,2),x*theta_stoch_vec,'rp-');
legend('measured', 'predicted-batch','predicted-stochastic');
grid on;
xlabel('Page index, x');
ylabel('Page views, y');
title('Measured and predicted page views');

j_theta = zeros(250, 250);   % initialize j_theta
theta0_vals = linspace(-2500, 2500, 250);
theta1_vals = linspace(-50, 50, 250);
for i = 1:length(theta0_vals)
	  for j = 1:length(theta1_vals)
		theta_val_vec = [theta0_vals(i) theta1_vals(j)]';
		h_theta = (x*theta_val_vec);
		j_theta(i,j) = 1/(2*m)*sum((h_theta - y).^2);
    end
end
figure;
contour(theta0_vals,theta1_vals,10*log10(j_theta.'))
xlabel('theta_0'); ylabel('theta_1')
title('Cost function J(theta)');
hold on;
plot(theta_stoch_vec_v(:,1),theta_stoch_vec_v(:,2),'rs.');
plot(theta_batch_vec_v(:,1),theta_batch_vec_v(:,2),'kx.');

The converged values are :

From the below plot, we can see that Batch Gradient Descent (black line) goes straight down to the minima, whereas Stochastic Gradient Descent (red line) keeps hovering around (thick red line) before going down to the minima.

References

An Application of Supervised Learning – Autonomous Deriving (Video Lecture, Class2)

CS 229 Machine Learning Course Materials

The post Stochastic Gradient Descent appeared first on DSP LOG.

Batch Gradient Descent

Krishna Sankar — Sat, 29 Oct 2011 06:41:27 +0000

I happened to stumble on Prof. Andrew Ng’s Machine Learning classes which are available online as part of Stanford Center for Professional Development. The first lecture in the series discuss the topic of fitting parameters for a given data set using linear regression. For understanding this concept, I chose to take data from the top 50 articles of this blog based on the pageviews in the month of September 2011.

Notations

Let

be the number of training set (in our case top 50 articles),

be the input sequence (the page index),

be the output sequence (the page views for each page index)

be the number of features/parameters (=2 for our example).

The value of corresponds to the training set

Let us try to predict the number of page views for a given page index using a hypothesis, where is defined as :

where,

is the page index,

Linear regression using gradient descent

Given the above hypothesis, let us try to figure out the parameter which minimizes the square of the error between the predicted value and the actual output for all values in the training set i.e.

Let us define the cost function as,

The scaling by fraction is just for notational convenience.

Let us start with some parameter vector , and keep changing the to reduce the cost function , i.e.

The parameter vector after algorithm convergence can be used for prediction.

Note :

1. For each update of the parameter vector , the algorithm process the full training set. This algorithm is called Batch Gradient Descent.

2. For the given example with 50 training sets, the going over the full training set is computationally feasible. However when the training set is very large, we need to use a slight variant of this scheme, called Stochastic Gradient Descent. We will discuss that in another post.

3. The proof of the derivation of involving differential with will be of interest. We will discuss that in another post.

Matlab/Octave code snippet

clear ;
close all;
x = [1:50].';
y = [4554 3014 2171 1891 1593 1532 1416 1326 1297 1266 ...
	1248 1052 951 936 918 797 743 665 662 652 ...
	629 609 596 590 582 547 486 471 462 435 ...
	424 403 400 386 386 384 384 383 370 365 ...
	360 358 354 347 320 319 318 311 307 290 ].';

m = length(y); % store the number of training examples
x = [ ones(m,1) x]; % Add a column of ones to x
n = size(x,2); % number of features
theta_vec = [0 0]';
alpha = 0.002;
err = [0 0]';
for kk = 1:10000
	h_theta = (x*theta_vec);
	h_theta_v = h_theta*ones(1,n);
	y_v = y*ones(1,n);
	theta_vec = theta_vec - alpha*1/m*sum((h_theta_v - y_v).*x).';
	err(:,kk) = 1/m*sum((h_theta_v - y_v).*x).';
end

figure;
plot(x(:,2),y,'bs-');
hold on
plot(x(:,2),x*theta_vec,'rp-');
legend('measured', 'predicted');
grid on;
xlabel('Page index, x');
ylabel('Page views, y');
title('Measured and predicted page views');

The computed values are

With this hypotheses, the predicted page views is shown in the red curve (in the below plot).

In matlab code snippet, kept the number of step of gradient descent blindly as 10000. One can probably stop the gradient descent when the cost function is small and/or when rate of change of is small.

Couple of things to note :

1. Given that the measured values are showing an exponential trend, trying to fit a straight line does not seem like a good idea. Anyhow, given this is the first post in this series, I let it pass.

2. The value of controls the rate of convergence of the algorithm. If is very small, the algorithm takes small steps and takes longer time to converge. Higher value of causes the algorithm to take large steps, and may cause algorithm to diverge.

3. Have not figured how to select value suitable (fast convergence) for the data set under consideration. Will figure that out later.

Plotting the variation of for different values of

clear;
j_theta = zeros(250, 250);   % initialize j_theta
theta0_vals = linspace(-5000, 5000, 250);
theta1_vals = linspace(-200, 200, 250);
for i = 1:length(theta0_vals)
	  for j = 1:length(theta1_vals)
		theta_val_vec = [theta0_vals(i) theta1_vals(j)]';
		h_theta = (x*theta_val_vec);
		j_theta(i,j) = 1/(2*m)*sum((h_theta - y).^2);
    end
end
figure;
surf(theta0_vals, theta1_vals,10*log10(j_theta.'));
xlabel('theta_0'); ylabel('theta_1');zlabel('10*log10(Jtheta)');
title('Cost function J(theta)');
figure;
contour(theta0_vals,theta1_vals,10*log10(j_theta.'))
xlabel('theta_0'); ylabel('theta_1')
title('Cost function J(theta)');

Given that the surface() plot is bit unwieldy in my relatively slow desktop, using contour() plot seems to be a much better choice. Can see that the minima of this cost function lies near the computed values of

References

An Application of Supervised Learning – Autonomous Deriving

The post Batch Gradient Descent appeared first on DSP LOG.