% % Problem epage 70 % Example epage 51 % % Written by: % -- % John L. Weatherwax 2005-08-14 % % email: wax@alum.mit.edu % % Please send comments and especially bug reports to the % above email address. % %----- addpath( '../../Code/eda_data' ); close all; drawnow; clc; clear; load yeast; [n,p] = size(data); corm = corrcoef( data ); % Compute the correlation matrix: % Perform PCA on the correlation matrix: [eigvec,eigval] = eig(corm); eigval = diag(eigval); % extract the diagonal elements % order in descending order eigval = flipud(eigval); eigvec = eigvec(:,p:-1:1); % Do a scree plot. figure, plot(1:length(eigval),eigval,'ko-') %title('Scree Plot') xlabel('Eigenvalue Index - k'); ylabel('Eigenvalue'); saveas( gcf, '../../WriteUp/Graphics/Chapter2/prob_2_4_scree', 'eps' ); % From this plot, dimensionality of 4 seems reasonable. % Now for the percentage of variance explained. pervar = 100*cumsum(eigval)/sum(eigval); % > pervar' % 73.9415 84.3993 90.6658 93.5881 95.2096 96.1828 97.0169 % 97.7369 98.3001 98.7833 99.0903 99.3583 99.5411 99.6848 % 99.8176 99.9182 100.0000 % % pick 5 or so % Now for the broken stick test. % First get the expected lengths/sizes of the eigenvalues. g = zeros(1,p); for k = 1:p, for i = k:p g(k) = g(k) + 1/i; end end g = g/p; % what is the proportion of variance explained. This is for the covariance % method. propvar = eigval/sum(eigval); figure; plot( propvar, 'rx-' ); hold on; plot( g, 'go-' ); xlabel('Eigenvalue Index - k'); saveas( gcf, '../../WriteUp/Graphics/Chapter2/prob_2_4_brokenstick', 'eps' ); % % now find those that explain more than the expected amount. ind = find(propvar' > g); % According to this, only the first one qualifies as accounting for more variance than % would be expected by chance. It explains around 74% of the variance. % Now for the size of the variance. avgeig = mean(eigval); % Find the length of ind: ind = find(eigval > avgeig); length(ind) % According to this test, the first 3 would be retained. % So, using 3, we will reduce the dimensionality. P = eigvec(:,1:3); Xp = data*P; figure,plot3(Xp(:,1),Xp(:,2),Xp(:,3),'k*') xlabel('PC 1'),ylabel('PC 2'),zlabel('PC 3') grid on axis tight % We could look at them using: figure,plotmatrix(Xp)