使用ggplot2添加到矩阵相关热图中的显着性水平 [英] Significance level added to matrix correlation heatmap using ggplot2

查看:793
本文介绍了使用ggplot2添加到矩阵相关热图中的显着性水平的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何为矩阵相关热图增加另一层重要和所需的复杂度,例如,除了R2值(-1至1)之外,显着性水平的方式之后的p值? >
在这个问题中,并没有将意义等级星或OR值作为文本显示在矩阵的每个正方形上,而是用每个显着性水平的图形显示表示矩阵的平方。我认为只有那些喜欢创新思维的人才能赢得掌声,以解开这种解决方案,以便有最好的方式来表达复杂度的增加部分,以达到我们的半真相矩阵相关热图。我搜索了很多,但从来没有见过一个正确的,或者我会说一个眼睛友好的方式来表示显着性水平加上反映R系数的标准色彩阴影。

可再现的数据集在这里找到:

http:// learnr .wordpress.com / 2010/01/26 / ggplot2-quick-heatmap-plotting /

R码请在下面找到:

  library(ggplot2)
library(plyr)#无论如何这里可能都不需要它是一个必须拥有的软件包我认为在R
library(reshape2)#以融化您的数据集
库(尺度)#它具有重新调整功能,它需要在热图
库(RColorBrewer)#为了方便热图颜色,它反映了你的心情有时
nba< - read.csv(http://datasets.flowingdata.com/ppg2008.csv)
nba< - as.data.frame(cor(nba [2:ncol(nba)] ))#将矩阵相关转换为数据帧
nba。 m < - data.frame(row = rownames(nba),nba)#创建一个名为row的列
rownames(nba)< - NULL #get删除行名
nba< - 熔化(nba)
nba.m $值< -cut(nba.m $ value,breaks = c(-1,-0.75,-0.5,-0.25,0,0.25,0.5,0.75,1) ,include.lowest = TRUE,标记= C( ( - 0.75,-1), ( - 0.5,-0.75), ( - 0.25,-0.5), (0,-0.25), (0,0.25),(0.25,0.5),(0.5,0.75),(0.75,1)))#这可以被定制为使用剪切功能将相关性放入类别中用适当的标签在图例中显示它们,这一列现在将是离散的而不是连续的
nba.m $ row< - factor(nba.m $ row,levels = rev(unique(as.character(nba .m $ variable))))#重新排列将被用作图中x轴的行列,然后将其转换为一个因子并现在下令
#now plotting
ggplot(nba。 m,aes(row,variable))+
geom_tile(aes(fill = value),color =black)+
scale_fill_brewer(palette =RdYlGn,name =Correlation)#here来的RColorBrewer包,现在如果你问我为什么你选择这种调色板颜色我会说看看你的手机的电池充电指示灯,例如你的剃须刀,低点时不会变红?并在充电时回到绿色?这是选择这种颜色的灵感。

矩阵相关热图应如下所示:



增强解决方案的提示和想法: >
- 此代码可能对从本网站获取的明显级别有所了解:

http://ohiodata.blogspot.de/2012/06/correlation-tables-in-r-flagged-with.html


R code:

  mystars < -  ifelse(p <.001, ***,ifelse(p <.01,**,ifelse(p <.05,*,)))#so 4种类

- 显着性水平可以作为颜色强度添加到每个正方形像阿尔法美学,但我不认为这将很容易解释和捕获<
- 另一个想法是有4个不同大小的正方形记号克星,当然给最小的非显着和增加到一个全尺寸的广场,如果最高的星星
- 另一个想法,包括一个圆圈内的重要广场和线的厚度圆对应于其中一个颜色的全部颜色的重要性级别(剩余的3个类别)

- 与上面相同,但固定线条粗细,同时为剩余的3个颜色级别提供3种颜色

- 可能你想出了更好的点子,谁知道? 这只是一次尝试,解决方案,我在这里绘制了星星作为解决方案的指示器,但正如我所说的,目标是找到比星星说得更好的图形解决方案。我只是使用geom_point和alpha来表示显着性水平,但是问题在于NA(包括非显着性值)会显示出像三星级意义的水平,如何解决这个问题?我认为在使用多种颜色时使用一种颜色可能更加贴近人眼,并避免为情节增加许多细节以解决眼睛问题。在此先感谢。

下面是我第一次尝试的情节:



或者可能会更好?!



我认为最好的直到现在是下面的一个,直到你想出更好的东西!

请求,下面的代码是为最后一个热图:

 #函数将整个矩阵的概率取为一半,这里是Spearman可以将它改变为Kendall或Pearson 
cor.prob.all< - function(X,dfr = nrow(X)-2){
R< -cor(X,use = ),b = b2,b2,b2,b2,b2,b2,...,f2,..., pf(Fstat,1,dfr)
R [row(R)== col(R)] < - NA
R
}
#将矩阵改为数据框
nbar< - as.data.frame(cor(nba [2:ncol(nba)]),method =spearman)##给r ^ 2
nbap< - as.data.frame (cor.prob.all(nba [2:ncol(nba)]))#为p值的数据框
#重置rownames
nbar< - data.frame(row = rownames(nbar) ,nbar)#创建一个名为row的列
rownames(nbar)< - NULL
nbap< - data.frame(row = rownames(nbap),nbap)#create a column c (nbap)< - NULL
#Melt
nbar.m< - melt(nbar)
nbap.m< - melt(nbap)
#Classify(您可以对nbar和nbap进行不同的分类)
nbar.m $ value2< -cut(nbar.m $ value,breaks = c(-1,-0.75,-0.5, -0.25,0,0.25,0.5,0.75,1),include.lowest = TRUE,label = c(( - 0.75,-1),( - 0.5,-0.75),( - 0.25, - 0.5),(0,-0.25),(0,0.25),(0.25,0.5),(0.5,0.75),(0.75,1)))#图例
nbap.m $ value2< -cut(nbap.m $ value,breaks = c(-Inf,0.001,0.01,0.05),label = c(***,**, *))
nbar.m< -cbind.data.frame(nbar.m,nbap.m $ value,nbap.m $ value2)#将p值及其剪切添加到R的第一个数据集系数
名称(nbar.m)[5]< -paste(valuep)#更改数据框的名称
名称(nbar.m)[6]< -paste( )
nbar.m $ row< - factor(nbar.m $ row,levels = rev(unique(as.character(nbar.m $ variable))))#重新排列变量因子
#绘制矩阵相关热图
#Set (),panel.grid.minor = theme_blank(),panel.grid.major = theme_blank()))
pa <-ggplot(nbar.m,aes(row,variable))+
geom_tile(aes(fill = value2),color =white)+
scale_fill_brewer(palette =RdYlGn ,name =Correlation)+#RColorBrewer package
opts(axis.text.x = theme_text(angle = -90))+
po.nopanel
pa#查看第一张图
#使用geom_text添加显着性水平星星
pp < - pa +
geom_text(aes(label = signif。),size = 2,na.rm = TRUE)#您可以玩大小
#如果可以很好地表示显着性水平,则可以使用alpha美学的解决方法,同样的解决方法也可以应用于ggplot2中的大小美观。应用阿尔法美学来显示重要性是有点问题的,因为我们希望alpha值低而p值高,反之则无法在没有变通办法
nbar.m $ signif的情况下完成。 < -rescale(as.numeric(nbar.m $ signif。),to = c(0.1,0.9))#我尝试使用to = c(0.1,0.9)参数,但是为了避免出现问题下一个相互分值的步骤,这是alpha美学需要的解决方法
nbar.m $ signif。< -as.factor(0.09 / nbar.m $ signif。)#alpha现在表现得像通缉,除了NAs值stil显示如同三星级别,如何解决这个问题?
#在geom_point中以正方形形式添加alpha美学(您可以在这里改进)
pp< -pa +
geom_point(data = nbar.m,aes(alpha = signif。) ,shape = 22,size = 5,color =darkgreen,na.rm = TRUE,legend = FALSE)#你可以删除这一步,这个步骤的结果可以在上面的绿色热图中的一个图层中看到,使用的形状是22,这也是一个正方形,但你可以相应地调整大小

我希望这可以是向前迈进的一步!请注意:

- 有些人建议以不同的方式对R ^ 2进行分类或删减,当然,我们可以做到这一点,但我们仍然想向观众展示显着性水平,而不是用星星水平。我们可以原则上实现吗?

- 有人建议以不同的方式削减p值,好吧,这可以是在没有显示出3个重要程度的失败之后做出的一个选择。那么可能会更好地显示重要/非重要的没有水平

- 你可能有一个更好的想法,在ggplot2中为alpha和尺寸美学提出上述解决方法,希望能尽快得到你的消息!

- 问题尚未解答,正在等待创新的解决方案!
- 有趣的是,corrplot软件包可以做到!我通过这个软件包得到了下图,PS:交叉正方形不是重要的,signif = 0.05。但是我们怎么能把这个翻译成ggplot2,我们可以吗?!





- 你可以做圈子并隐藏那些不重要的东西吗?如何在ggplot2中做到这一点?!



I wonder how one can add another layer of important and needed complexity to a matrix correlation heatmap like for example the p value after the manner of the significance level stars in addition to the R2 value (-1 to 1)?
It was NOT INTENDED in this question to put significance level stars OR the p values as text on each square of the matrix BUT rather to show this in a graphical out-of-the-box representation of significance level on each square of the matrix. I think only those who enjoy the blessing of INNOVATIVE thinking can win the applause to unravel this kind of solution in order to have the best way to represent that added component of complexity to our "half-of-the-truth matrix correlation heatmaps". I googled a lot but never seen a proper or I shall say an "eye-friendly" way to represent the significance level PLUS the standard color shades that reflect the R coefficient.
The reproducible data set is found here:
http://learnr.wordpress.com/2010/01/26/ggplot2-quick-heatmap-plotting/
The R code please find below:

library(ggplot2)
library(plyr) # might be not needed here anyway it is a must-have package I think in R 
library(reshape2) # to "melt" your dataset
library (scales) # it has a "rescale" function which is needed in heatmaps 
library(RColorBrewer) # for convenience of heatmap colors, it reflects your mood sometimes
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
nba <- as.data.frame(cor(nba[2:ncol(nba)])) # convert the matrix correlations to a dataframe 
nba.m <- data.frame(row=rownames(nba),nba) # create a column called "row"
rownames(nba) <- NULL #get rid of row names
nba <- melt(nba)
nba.m$value<-cut(nba.m$value,breaks=c(-1,-0.75,-0.5,-0.25,0,0.25,0.5,0.75,1),include.lowest=TRUE,label=c("(-0.75,-1)","(-0.5,-0.75)","(-0.25,-0.5)","(0,-0.25)","(0,0.25)","(0.25,0.5)","(0.5,0.75)","(0.75,1)")) # this can be customized to put the correlations in categories using the "cut" function with appropriate labels to show them in the legend, this column now would be discrete and not continuous
nba.m$row <- factor(nba.m$row, levels=rev(unique(as.character(nba.m$variable)))) # reorder the "row" column which would be used as the x axis in the plot after converting it to a factor and ordered now
#now plotting
ggplot(nba.m, aes(row, variable)) +
geom_tile(aes(fill=value),colour="black") +
scale_fill_brewer(palette = "RdYlGn",name="Correlation")  # here comes the RColorBrewer package, now if you ask me why did you choose this palette colour I would say look at your battery charge indicator of your mobile for example your shaver, won't be red when gets low? and back to green when charged? This was the inspiration to choose this colour set.

The matrix correlation heatmap should look like this:

Hints and ideas to enhance the solution:
- This code might be useful to have an idea about the significance level stars taken from this website:
http://ohiodata.blogspot.de/2012/06/correlation-tables-in-r-flagged-with.html
R code:

mystars <- ifelse(p < .001, "***", ifelse(p < .01, "** ", ifelse(p < .05, "* ", " "))) # so 4 categories  

- The significance level can be added as colour intensity to each square like alpha aesthetics but I don't think this will be easy to interpret and to capture
- Another idea would be to have 4 different sizes of squares corresponding to the stars, of course giving the smallest to the non significant and increases to a full size square if highest stars
- Another idea to include a circle inside those significant squares and the thickness of the line of the circle corresponds to the level of significance (the 3 remaining categories) all of them of one colour
- Same as above but fixing the line thickness while giving 3 colours for the 3 remaining significant levels
- May be you come up with better ideas, who knows?

解决方案

This is just an attempt to enhance towards the final solution, I plotted the stars here as indicator of the solution, but as I said the aim is to find a graphical solution that can speak better than the stars. I just used geom_point and alpha to indicate significance level but the problem that the NAs (that includes the non-significant values as well) will show up like that of three stars level of significance, how to fix that? I think that using one colour might be more eye-friendly when using many colors and to avoid burdening the plot with many details for the eyes to resolve. Thanks in advance.
Here is the plot of my first attempt:

or might be this better?!

I think the best till now is the one below, until you come up with something better !

As requested, the below code is for the last heatmap:

# Function to get the probability into a whole matrix not half, here is Spearman you can change it to Kendall or Pearson
cor.prob.all <- function (X, dfr = nrow(X) - 2) {
R <- cor(X, use="pairwise.complete.obs",method="spearman")
r2 <- R^2
Fstat <- r2 * dfr/(1 - r2)
R<- 1 - pf(Fstat, 1, dfr)
R[row(R) == col(R)] <- NA
R
}
# Change matrices to dataframes
nbar<- as.data.frame(cor(nba[2:ncol(nba)]),method="spearman") # to a dataframe for r^2
nbap<- as.data.frame(cor.prob.all(nba[2:ncol(nba)])) # to a dataframe for p values
# Reset rownames
nbar <- data.frame(row=rownames(nbar),nbar) # create a column called "row" 
rownames(nbar) <- NULL
nbap <- data.frame(row=rownames(nbap),nbap) # create a column called "row" 
rownames(nbap) <- NULL
# Melt
nbar.m <- melt(nbar)
nbap.m <- melt(nbap)
# Classify (you can classify differently for nbar and for nbap also)         
nbar.m$value2<-cut(nbar.m$value,breaks=c(-1,-0.75,-0.5,-0.25,0,0.25,0.5,0.75,1),include.lowest=TRUE, label=c("(-0.75,-1)","(-0.5,-0.75)","(-0.25,-0.5)","(0,-0.25)","(0,0.25)","(0.25,0.5)","(0.5,0.75)","(0.75,1)")) # the label for the legend
nbap.m$value2<-cut(nbap.m$value,breaks=c(-Inf, 0.001, 0.01, 0.05),label=c("***", "** ", "*  ")) 
nbar.m<-cbind.data.frame(nbar.m,nbap.m$value,nbap.m$value2) # adding the p value and its cut to the first dataset of R coefficients
names(nbar.m)[5]<-paste("valuep") # change the column names of the dataframe 
names(nbar.m)[6]<-paste("signif.")
nbar.m$row <- factor(nbar.m$row, levels=rev(unique(as.character(nbar.m$variable)))) # reorder the variable factor
# Plotting the matrix correlation heatmap
# Set options for a blank panel
po.nopanel <-list(opts(panel.background=theme_blank(),panel.grid.minor=theme_blank(),panel.grid.major=theme_blank()))
pa<-ggplot(nbar.m, aes(row, variable)) +
geom_tile(aes(fill=value2),colour="white") +
scale_fill_brewer(palette = "RdYlGn",name="Correlation")+ # RColorBrewer package
opts(axis.text.x=theme_text(angle=-90))+
po.nopanel
pa # check the first plot
# Adding the significance level stars using geom_text 
pp<- pa +
geom_text(aes(label=signif.),size=2,na.rm=TRUE) # you can play with the size
# Workaround for the alpha aesthetics if it is good to represent significance level, the same workaround can be applied for size aesthetics in ggplot2 as well. Applying the alpha aesthetics to show significance is a little bit problematic, because we want the alpha to be low while the p value is high, and vice verse which can't be done without a workaround
nbar.m$signif.<-rescale(as.numeric(nbar.m$signif.),to=c(0.1,0.9)) # I tried to use to=c(0.1,0.9) argument as you might expect, but to avoid problems with the next step of reciprocal values when dividing over one, this is needed for the alpha aesthetics as a workaround
nbar.m$signif.<-as.factor(0.09/nbar.m$signif.) # the alpha now behaves as wanted  except for the NAs values stil show as if with three stars level, how to fix that?
# Adding the alpha aesthetics in geom_point in a shape of squares (you can improve here)
pp<- pa +
geom_point(data=nbar.m,aes(alpha=signif.),shape=22,size=5,colour="darkgreen",na.rm=TRUE,legend=FALSE) # you can remove this step, the result of this step is seen in one of the layers in the above green heatmap, the shape used is 22 which is again a square but the size you can play with it accordingly  

I hope that this can be a step forward to reach there! Please note:
- Some suggested to classify or cut the R^2 differently, ok we can do that of course but still we want to show the audience GRAPHICALLY the significance level instead of troubling the eye with the star levels. Can we ACHIEVE that in principle or not?
- Some suggested to cut the p values differently, Ok this can be a choice after failure of showing the 3 levels of significance without troubling the eye. Then it might be better to show significant/non-significant without levels
- There might be a better idea you come up with for the above workaround in ggplot2 for alpha and size aesthetics, hope to hear from you soon !
- The question is not answered yet, waiting for an innovative solution ! - Interestingly, "corrplot" package does it! I came up with this graph below by this package, PS: the crossed squares are not significant ones, level of signif=0.05. But how can we translate this to ggplot2, can we?!

-Or you can do circles and hide those non-significant? how to do this in ggplot2?!

这篇关于使用ggplot2添加到矩阵相关热图中的显着性水平的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆