PCA-LDA分析-R [英] PCA-LDA analysis - R
问题描述
在此示例中( https://gist.github.com/thigm85/8424654 )在虹膜数据集上检查了LDA与PCA的关系.如何在PCA结果(PCA-LDA)上执行LDA?
In this example (https://gist.github.com/thigm85/8424654) LDA was examined vs. PCA on iris dataset. How can I also do LDA on the PCA results (PCA-LDA) ?
代码:
require(MASS)
require(ggplot2)
require(scales)
require(gridExtra)
pca <- prcomp(iris[,-5],
center = TRUE,
scale. = TRUE)
prop.pca = pca$sdev^2/sum(pca$sdev^2)
lda <- lda(Species ~ .,
iris,
prior = c(1,1,1)/3)
prop.lda = lda$svd^2/sum(lda$svd^2)
plda <- predict(object = lda,
newdata = iris)
dataset = data.frame(species = iris[,"Species"],
pca = pca$x, lda = plda$x)
p1 <- ggplot(dataset) + geom_point(aes(lda.LD1, lda.LD2, colour = species, shape = species), size = 2.5) +
labs(x = paste("LD1 (", percent(prop.lda[1]), ")", sep=""),
y = paste("LD2 (", percent(prop.lda[2]), ")", sep=""))
p2 <- ggplot(dataset) + geom_point(aes(pca.PC1, pca.PC2, colour = species, shape = species), size = 2.5) +
labs(x = paste("PC1 (", percent(prop.pca[1]), ")", sep=""),
y = paste("PC2 (", percent(prop.pca[2]), ")", sep=""))
grid.arrange(p1, p2)
推荐答案
通常,在执行PCA之前,您先进行PCA-LDA来减小数据的大小.理想情况下,您确定要从PCA保留的前k个组件.在您使用虹膜的示例中,我们采用了前两个分量,否则看起来与不使用PCA的情况几乎相同.
Usually you do PCA-LDA to reduce the dimensions of your data before performing PCA. Ideally you decide the first k components to keep from the PCA. In your example with iris, we take the first 2 components, otherwise it will look pretty much the same as without PCA.
尝试这样:
pcdata = data.frame(pca$x[,1:2],Species=iris$Species)
pc_lda <- lda(Species ~ .,data=pcdata , prior = c(1,1,1)/3)
prop_pc_lda = pc_lda$svd^2/sum(pc_lda$svd^2)
pc_plda <- predict(object = pc_lda,newdata = pcdata)
dataset = data.frame(species = iris[,"Species"],pc_plda$x)
p3 <- ggplot(dataset) + geom_point(aes(LD1, LD2, colour = species, shape = species), size = 2.5) +
labs(x = paste("LD1 (", percent(prop_pc_lda[1]), ")", sep=""),
y = paste("LD2 (", percent(prop_pc_lda[2]), ")", sep=""))
print(p3)
您在这里看不到太大的区别,因为PCA的前2个组件捕获了虹膜数据集中的大部分方差.
You don't see much of a difference here because the first 2 components of the PCA captures most of the variance in the iris dataset.
这篇关于PCA-LDA分析-R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!