R:绘制ggplot2中线性判别分析的后验分类概率 [英] R: plotting posterior classification probabilities of a linear discriminant analysis in ggplot2

查看:954
本文介绍了R:绘制ggplot2中线性判别分析的后验分类概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 ggord 可以做出很好的线性判别分析 ggplot2 biplot(参考第11章,图11.5Biplots (MASS)
install.packages(devtools)在

 
library(devtools)
install_github(fawda123 / ggord)
library(ggord)
data(iris)
ord < - lda(Species〜。 ,iris,prior = rep(1,3)/ 3)
ggord(ord,iris $物种)



我还想添加分类区域(显示为相同颜色的实心区域作为它们各自的组,其中alpha = 0.5)或类成员的后验概率(然后根据这个后验概率和与每个组使用的颜色相同的颜色而变化)(可以在 BiplotGUI ,但我我正在寻找一个 ggplot2 解决方案)。有人会知道如何使用 ggplot2 来做到这一点,也许使用 geom_tile



编辑:下面有人问如何计算后验分类概率&预测课程。这是这样的:

  library(MASS)
library(ggplot2)
library(scales)$ (物种=,数据=虹膜,先验= rep(1,3)/ 3)
datPred< - data.frame(物种=预测(拟合)$类别,预测(适合)$ x)
#创建决策边界
fit2 < - lda(Species〜LD1 + LD2,data = datPred,prior = rep(1,3)/ 3)
ld1lim < - expand_range(c(min(datPred $ LD1),max(datPred $ LD1)),mul = 0.05)
ld2lim < - expand_range )),mul = 0.05)
ld1 < - seq(ld1lim [[1]],ld1lim [[2]],length.out = 300)
ld2 < - seq(ld2lim [ 1]],ld1lim [[2]],length.out = 300)
newdat < - expand.grid(list(LD1 = ld1,LD2 = ld2))
preds< -predict fit2,newdata = newdat)
predclass< - preds $ class
postprob< - preds $后面的
df< - data.frame(x = newdat $ LD1,y = newdat $ LD2,class = predclass)
df $ classnum< - as.numeric(df $ class)
df< - cbind(df,postprob)
head(df)

xy clas s classnum setosa versicolor virginica
1 -10.122541 -2.91246 virginica 3 5.417906e-66 1.805470e-10 1
2 -10.052563 -2.91246 virginica 3 1.428691e-65 2.418658e-10 1
3 -9.982585 -2.91246 virginica 3 3.767428e-65 3.240102e-10 1
4 -9.912606 -2.91246 virginica 3 9.934630e-65 4.340531e-10 1
5 -9.842628 -2.91246 virginica 3 2.619741e-64 5.814697e-10 1
6 -9.772650 -2.91246 virginica 3 6.908204e-64 7.789531e-10 1

colorfun< - function(n,l = 65,c = 100){色调= seq(15,375,长度= n + 1); hcl(h = hues,l = 1,c = c)[1:n]}#默认ggplot2颜色
颜色< - colorfun(3)
colorslight< - colorfun(3, 90,c = 50)
ggplot(datPred,aes(x = LD1,y = LD2))+
geom_raster(data = df,aes(x = x,y = y,fill = factor ),alpha = 0.7,show_guide = FALSE)+
geom_contour(data = df,aes(x = x,y = y,z = classnum),color =red2,alpha = 0.5,breaks = c(1.5,2.5))+
geom_point(data = datPred,size = 3,aes(pch = Species,color = Species))+
scale_x_continuous(limits = ld1lim,expand = c(0, 0))+
scale_y_continuous(limits = ld2lim,expand = c(0,0))+
scale_fill_manual(values = colorslight,guide = F)



(很不完全确定这一点在1.5和2.5处使用轮廓/断点显示分类边界的方法总是正确的 - 对于t是正确的他是物种1和物种2与物种2和3之间的界限,但是如果物种1的区域与物种3相邻,那么我就不会在那里出现两个界限 - 也许我将不得不使用所使用的方法



有人可能知道怎么做吗?或者是否有人对如何最好地表示这些后验分类概率有任何想法?请注意,该方法应该适用于任何数量的组,而不仅限于此特定示例。

解决方案

刚刚提出了以下简单的解决方案:只需在 df

  fit = lda(Species〜Sepal.Length + Sepal.Width,data = iris,prior = rep(1, 3)/ 3)
ld1lim < - expand_range(c(min(datPred $ LD1),max(datPred $ LD1)),mul = 0.5)
ld2lim < - expand_range(c(min(数据库$ LD2),最大(datPred $ LD2)),mul = 0.5)

并插入

  lvls = unique(df $ class)
df $ classpprob = apply(df [,as.character (lvls)],1,function(row)sample(lvls,1,prob = row))

p = ggplot(datPred,aes(x = LD1,y = LD2))+
geom_raster(data = df,aes(x = x,y = y,fill = factor(classpprob)),hpad = 0,vpad = 0,alpha = 0.7,show_guide = FALSE)+
geom_point(data = datpred,size = 3,aes(pch = Group,color = Group))+
scale_fill_manual(values = colorslight,guide = F)+
scale_x_continuous(limits = rngs [[1]],expand = c(0,0))+
scale_y_continuous(limits = rngs [[2]],expand = c(0,0))

给我



比开始混合颜色以某种加法或减法方式要容易和清楚很多(这是我仍然遇到困难的部分,显然并不那么微不足道)。

Using ggord one can make nice linear discriminant analysis ggplot2 biplots (cf chapter 11, Fig 11.5 in "Biplots in practice" by M. Greenacre), as in

library(MASS)
install.packages("devtools")
library(devtools)
install_github("fawda123/ggord")
library(ggord)
data(iris)
ord <- lda(Species ~ ., iris, prior = rep(1, 3)/3)
ggord(ord, iris$Species)

I would also like to add the classification regions (shown as solid regions of the same colour as their respective group with say alpha=0.5) or the posterior probabilities of class membership (with alpha then varying according to this posterior probability and the same colour as used for each group) (as can be done in BiplotGUI, but I am looking for a ggplot2 solution). Would anyone know how to do this with ggplot2, perhaps using geom_tile ?

EDIT: below someone asks how to calculate the posterior classification probabilities & predicted classes. This goes like this:

library(MASS)
library(ggplot2)
library(scales)
fit <- lda(Species ~ ., data = iris, prior = rep(1, 3)/3)
datPred <- data.frame(Species=predict(fit)$class,predict(fit)$x)
#Create decision boundaries
fit2 <- lda(Species ~ LD1 + LD2, data=datPred, prior = rep(1, 3)/3)
ld1lim <- expand_range(c(min(datPred$LD1),max(datPred$LD1)),mul=0.05)
ld2lim <- expand_range(c(min(datPred$LD2),max(datPred$LD2)),mul=0.05)
ld1 <- seq(ld1lim[[1]], ld1lim[[2]], length.out=300)
ld2 <- seq(ld2lim[[1]], ld1lim[[2]], length.out=300)
newdat <- expand.grid(list(LD1=ld1,LD2=ld2))
preds <-predict(fit2,newdata=newdat)
predclass <- preds$class
postprob <- preds$posterior
df <- data.frame(x=newdat$LD1, y=newdat$LD2, class=predclass)
df$classnum <- as.numeric(df$class)
df <- cbind(df,postprob)
head(df)

           x        y     class classnum       setosa   versicolor virginica
1 -10.122541 -2.91246 virginica        3 5.417906e-66 1.805470e-10         1
2 -10.052563 -2.91246 virginica        3 1.428691e-65 2.418658e-10         1
3  -9.982585 -2.91246 virginica        3 3.767428e-65 3.240102e-10         1
4  -9.912606 -2.91246 virginica        3 9.934630e-65 4.340531e-10         1
5  -9.842628 -2.91246 virginica        3 2.619741e-64 5.814697e-10         1
6  -9.772650 -2.91246 virginica        3 6.908204e-64 7.789531e-10         1

colorfun <- function(n,l=65,c=100) { hues = seq(15, 375, length=n+1); hcl(h=hues, l=l, c=c)[1:n] } # default ggplot2 colours
colors <- colorfun(3)
colorslight <- colorfun(3,l=90,c=50)
ggplot(datPred, aes(x=LD1, y=LD2) ) +
    geom_raster(data=df, aes(x=x, y=y, fill = factor(class)),alpha=0.7,show_guide=FALSE) +
    geom_contour(data=df, aes(x=x, y=y, z=classnum), colour="red2", alpha=0.5, breaks=c(1.5,2.5)) +
    geom_point(data = datPred, size = 3, aes(pch = Species,  colour=Species)) +
    scale_x_continuous(limits = ld1lim, expand=c(0,0)) +
    scale_y_continuous(limits = ld2lim, expand=c(0,0)) +
    scale_fill_manual(values=colorslight,guide=F)

(well not totally sure this approach for showing classification boundaries using contours/breaks at 1.5 and 2.5 is always correct - it is correct for the boundary between species 1 and 2 and species 2 and 3, but not if the region of species 1 would be next to species 3, as I would get two boundaries there then - maybe I would have to use the approach used here where each boundary between each species pair is considered separately)

This gets me as far as plotting the classification regions. I am looking for a solution though to also plot the actual posterior classification probabilities for each species at each coordinate, using alpha (opaqueness) proportional to the posterior classification probability for each species, and a species-specific colour. In other words, with a stack of three images superimposed. As alpha blending in ggplot2 is known to be order-dependent, I think the colours of this stack would have to calculated beforehand though, and plotted using something like

qplot(x, y, data=mydata, fill=rgb, geom="raster") + scale_fill_identity() 

Here is a SAS example of what I am after:

Would anyone know how to do this perhaps? Or does anyone have any thoughts on how to best represent these posterior classification probabilities?

Note that the method should work for any number of groups, not just for this specific example.

解决方案

Also just came up with the following easy solution: just make a column in df where class predictions are made stochastically, according to the posterior probabilities, which then results in dithering in uncertain regions, e.g. as in

fit = lda(Species ~ Sepal.Length + Sepal.Width, data = iris, prior = rep(1, 3)/3)
ld1lim <- expand_range(c(min(datPred$LD1),max(datPred$LD1)),mul=0.5)
ld2lim <- expand_range(c(min(datPred$LD2),max(datPred$LD2)),mul=0.5)

rest as above, and inserting

lvls=unique(df$class)
df$classpprob=apply(df[,as.character(lvls)],1,function(row) sample(lvls,1,prob=row))

p=ggplot(datPred, aes(x=LD1, y=LD2) ) +
  geom_raster(data=df, aes(x=x, y=y, fill = factor(classpprob)),hpad=0, vpad=0, alpha=0.7,show_guide=FALSE) +
  geom_point(data = datPred, size = 3, aes(pch = Group,  colour=Group)) +
  scale_fill_manual(values=colorslight,guide=F) +
  scale_x_continuous(limits=rngs[[1]], expand=c(0,0)) +
  scale_y_continuous(limits=rngs[[2]], expand=c(0,0))

gives me

A lot easier and clearer than starting to mix colours in some additive or subtractive fashion anyway (which is the part where I still had trouble, and which apparently is not so trivial to do well).

这篇关于R:绘制ggplot2中线性判别分析的后验分类概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆