用ggplot展开密度图 [英] Spread out density plots with ggplot

查看:115
本文介绍了用ggplot展开密度图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从五十分之一看到了这个伟大的情节,它与不同大学的密度情节略有重叠。查看

解决方案

与ggplot一样,关键是获取数据正确的形式在,然后绘图非常简单。我相信会有另一种方法来做到这一点,但我的做法是用 density()进行密度估计,然后制作一种手动 geom_density() with geom_ribbon(),它需要 ymin ymax ,这是将形状从x轴移开所必需的。

其余的挑战在于获得打印正确,因为ggplot似乎会首先打印最宽的色带。最后,需要最庞大代码的部分是生成四分位数。



我还制作了一些与原始数字更一致的数据。 / b>

 库(ggplot2)
库(dplyr)
库(扫帚)
rawdata< - data.frame(Score = rnorm(1000,seq(1,0,length.out = 10),sd = 1),
Group = rep(LETTERS [1:10],10000))

df< - rawdata%>%
mutate(GroupNum = rev(as.numeric(Group)))%>%#rev()表示排序从上到下
group_by(Group,GroupNum)%>%
do(tidy(密度(。$ Score,bw = diff(范围(。$ Score))/ 20)))%>%#具有相当大的带宽
group_by()%>%
mutate(ymin = GroupNum *(max(y)/ 1.5),#此常数控制组之间有多少重叠有
ymax = y + ymin,
ylabel = ymin + min(ymin)/ 2,
xlabel = min(x) - 平均值(范围(x))/ 2) ols左边多少标签是

#获得四分位数
标签< - rawdata%>%
mutate(GroupNum = rev(as.numeric(Group)) )%>%
group_by(Group,GroupNum)%>%
mutate(q1 =分位数(分数)[2],
中位数=分位数(分数)[3],
q3 = quantile(Score)[4])%>%
filter(row_number()== 1)%>%
select(-Score)%>%
(left_join(df)%>%
mutate(xmed = x [which.min(abs(x - median))],
yminmed = ymin [which.min(abs(x - median) )],
ymaxmed = ymax [which.min(abs(x - median))])%>%
filter(row_number()== 1)

p< ; - ggplot(df,aes(x,ymin = ymin,ymax = ymax))+ geom_text(data = labels,aes(xlabel,ylabel,label = Group))+


geom_vline(xintercept = 0,size = 1.5,alpha = 0.5,color =#626262)+
geom_vline(xintercept = c(-2.5,-1.25,1.25,2.5),size = 0.75,alpha = 0.25 ,color =#626262)+
theme(panel.grid = element_bla nk(),
panel.background = element_rect(fill =#F0F0F0),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
$ b $ lt; - p + geom_ribbon(data = df [df $ GroupNum == i,],$ b axis.title = element_blank())
for(i in unique(df $ GroupNum)) aes(group = GroupNum),color =#F0F0F0,fill =black)+
geom_segment(data = labels [labels $ GroupNum == i,],aes(x = xmed,xend = xmed, y = yminmed,yend = ymaxmed),color =#F0F0F0,linetype =dashed)+
geom_segment(data = labels [labels $ GroupNum == i,],x = min(df $ x) ,xend = max(df $ x),aes(y = ymin,yend = ymin),size = 1.5,lineend =round)
}
p < - p + geom_text(data = labels [labels $ Group ==A,],aes(xmed-xlabel / 50,ylabel),
label =Median,color =#F0F0F0,hjust = 0,fontface =italic,大小= 4)

编辑
我注意到原始的做了一点冒险用水平线伸出每个分布(如果仔细观察,可以看到一个连接)。我在循环中添加了类似于第二个 geom_segment()的部分内容。


I saw this great plot from fivethirty that has a slight overlap of density plots for different colleges. Check out this link at fivethirtyeight.com

How would you replicate this plot with ggplot2?

Specifically how would you get that slight overlap, facet_wrap isn't going to work.

TestFrame <-  
  data.frame(
    Score =
      c(rnorm(100, 0, 1)
        ,rnorm(100, 0, 2)
        ,rnorm(100, 0, 3)
        ,rnorm(100, 0, 4)
        ,rnorm(100, 0, 5))
    ,Group =
      c(rep('Ones', 100)
        ,rep('Twos', 100)
        ,rep('Threes', 100)
        ,rep('Fours', 100)
        ,rep('Fives', 100))
  )

ggplot(TestFrame, aes(x = Score, group = Group)) +
  geom_density(alpha = .75, fill = 'black')

解决方案

As always with ggplot, the key is getting the data in the right format, and then the plotting is pretty straightforward. I'm sure there would be another way to do this, but my approach was to do the density estimation with density() and then to make a sort of manual geom_density() with geom_ribbon(), which takes a ymin and ymax, necessary for moving the shape off the x axis.

The rest of the challenge was in getting the order of the printing correct, since it seems that ggplot will print the widest ribbon first. In the end, the part that requires the bulkiest code is the production of the quartiles.

I also produced some data that is a bit more consistent with the original figure.

library(ggplot2)
library(dplyr)
library(broom)
rawdata <- data.frame(Score = rnorm(1000, seq(1, 0, length.out = 10), sd = 1),
                  Group = rep(LETTERS[1:10], 10000))

df <- rawdata %>% 
  mutate(GroupNum = rev(as.numeric(Group))) %>% #rev() means the ordering will be from top to bottom
  group_by(Group, GroupNum) %>% 
  do(tidy(density(.$Score, bw = diff(range(.$Score))/20))) %>% #The original has quite a large bandwidth
  group_by() %>% 
  mutate(ymin = GroupNum * (max(y) / 1.5), #This constant controls how much overlap between groups there is
         ymax = y + ymin,
         ylabel = ymin + min(ymin)/2,
         xlabel = min(x) - mean(range(x))/2) #This constant controls how far to the left the labels are

#Get quartiles
labels <- rawdata %>% 
  mutate(GroupNum = rev(as.numeric(Group))) %>% 
  group_by(Group, GroupNum) %>% 
  mutate(q1 = quantile(Score)[2],
         median = quantile(Score)[3],
         q3 = quantile(Score)[4]) %>%
  filter(row_number() == 1) %>% 
  select(-Score) %>% 
  left_join(df) %>% 
  mutate(xmed = x[which.min(abs(x - median))],
         yminmed = ymin[which.min(abs(x - median))],
         ymaxmed = ymax[which.min(abs(x - median))]) %>% 
  filter(row_number() == 1)

p <- ggplot(df, aes(x, ymin = ymin, ymax = ymax)) + geom_text(data = labels, aes(xlabel, ylabel, label = Group)) +


geom_vline(xintercept = 0, size = 1.5, alpha = 0.5, colour = "#626262") + 
  geom_vline(xintercept = c(-2.5, -1.25, 1.25, 2.5), size = 0.75, alpha = 0.25, colour = "#626262") + 
  theme(panel.grid = element_blank(),
        panel.background = element_rect(fill = "#F0F0F0"),
        axis.text.y = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank())
for (i in unique(df$GroupNum)) {
  p <- p + geom_ribbon(data = df[df$GroupNum == i,], aes(group = GroupNum), colour = "#F0F0F0", fill = "black") +
    geom_segment(data = labels[labels$GroupNum == i,], aes(x = xmed, xend = xmed, y = yminmed, yend = ymaxmed), colour = "#F0F0F0", linetype = "dashed") +
    geom_segment(data = labels[labels$GroupNum == i,], x = min(df$x), xend = max(df$x), aes(y = ymin, yend = ymin), size = 1.5, lineend = "round") 
}
p <- p + geom_text(data = labels[labels$Group == "A",], aes(xmed - xlabel/50, ylabel), 
                   label = "Median", colour = "#F0F0F0", hjust = 0, fontface = "italic", size = 4)  

Edit I noticed the original actually does a bit of fudging by stretching out each distribution with a horizontal line (you can see a join if you look closely...). I added something similar with the second geom_segment() in the loop.

这篇关于用ggplot展开密度图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆