如何在R中绘制多个分类变量的平行坐标 [英] How to plot parallel coordinates with multiple categorical variables in R

查看:661
本文介绍了如何在R中绘制多个分类变量的平行坐标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用GGally包中的 ggparcoord 绘制平行坐标图时,我遇到了困难。由于存在两个分类变量,我想在可视化中显示的内容如下图所示。我发现在 ggparcoord 中, groupColumn 只允许 一个变量 em> 来分组(颜色),当然我可以使用showPoints来标记坐标轴上的值,但我也需要根据分类变量来改变这些标记的形状。有没有其他的软件包可以帮助我实现我的想法?



任何回应将不胜感激!谢谢!



解决方案

在ggplot2中展开自己的平行坐标图并不难,这将使您可以灵活地定制美学。以下是使用内置钻石数据框的图示。为获得平行坐标,需要添加一个 ID 列,以便识别数据框的每一行,我们将在ggplot中用作 group 审美。您还需要缩放数值,以便它们在绘制时都处于相同的垂直缩放比例。然后,您需要在x轴上采用所有需要的列,并将它们重新塑形为长格式。我们使用 tidyverse / dplyr 管道运算符来完成所有这些工作。



即使在限制类别组合,这些线条可能太缠绕在一起,因为这个情节很容易解释,所以请认为这只是一个概念验证。希望你可以创建一些对你的数据更有用的东西。我在下面使用了 color (用于线条)和 fill (用于点)美学。您可以根据需要使用 shape linetype

 库(tidyverse)
theme_set(theme_classic())

#限制$ b $后从钻石数据框中获取20个随机行b#分为两个级别,分别为切割和颜色
set.seed(2)
ds =菱形%>%
滤镜(%c(D,J ),%c(Good,Premium)%%>%
sample_n(20)

ggplot(ds%>%
mutate( ID = 1:n())%>%#为每一行添加ID
mutate_if(is.numeric,scale)%>%#缩放数字列
gather(key,value,c )重新设置为long格式
aes(key,value,group = ID,color = color,fill = cut))+
geom_line()+
geom_point(size = 2,shape = 21,color =grey50)+
scale_fill_manual(values = c(black,white))
pre>



我没有使用 ggparcoords 之前,但唯一看似简单的选项(至少在我第一次尝试使用该函数时)是将两列数据粘贴在一起。下面是一个例子。即使只有四种类别的组合,情节也是令人困惑的,但是如果数据中存在强大的模式,也许可以解释:

  library(GGally)

ds $ group = with(ds,paste(cut,color,sep = - ))

ggparcoord(ds,columns = c 1,5:10),groupColumn = 11)+
theme(panel.grid.major.x = element_line(color =grey70))


I am facing a difficulty while plotting a parallel coordinates plot using the ggparcoord from the GGally package. As there are two categorical variables, what I want to show in the visualisation is like the image below. I've found that in ggparcoord, groupColumn is only allowed to a single variable to group (colour) by, and surely I can use showPoints to mark the values on the axes, but i also need to vary the shape of these markers according to the categorical variables. Is there other package that can help me to realise my idea?

Any response will be appreciated! Thanks!

解决方案

It's not that difficult to roll your own parallel coordinates plot in ggplot2, which will give you the flexibility to customize the aesthetics. Below is an illustration using the built-in diamonds data frame.

To get parallel coordinates, you need to add an ID column so you can identify each row of the data frame, which we'll use as a group aesthetic in ggplot. You also need to scale the numeric values so that they'll all be on the same vertical scale when we plot them. Then you need to take all the columns that you want on the x-axis and reshape them to "long" format. We do all that on the fly below with the tidyverse/dplyr pipe operator.

Even after limiting the number of category combinations, the lines are probably too intertwined for this plot to be easily interpretable, so consider this merely a "proof of concept". Hopefully, you can create something more useful with your data. I've used colour (for the lines) and fill (for the points) aesthetics below. You can use shape or linetype instead, depending on your needs.

library(tidyverse)
theme_set(theme_classic())

# Get 20 random rows from the diamonds data frame after limiting
#  to two levels each of cut and color
set.seed(2)
ds = diamonds %>% 
  filter(color %in% c("D","J"), cut %in% c("Good", "Premium")) %>%
  sample_n(20)

ggplot(ds %>% 
         mutate(ID = 1:n()) %>%             # Add ID for each row
         mutate_if(is.numeric, scale) %>%   # Scale numeric columns
         gather(key, value, c(1,5:10)),     # Reshape to "long" format
       aes(key, value, group=ID, colour=color, fill=cut)) +
  geom_line() +
  geom_point(size=2, shape=21, colour="grey50") +
  scale_fill_manual(values=c("black","white"))

I haven't used ggparcoords before, but the only option that seemed straightforward (at least on my first try with the function) was to paste together two columns of data. Below is an example. Even with just four category combinations, the plot is confusing, but maybe it will be interpretable if there are strong patterns in your data:

library(GGally)

ds$group = with(ds, paste(cut, color, sep="-"))

ggparcoord(ds, columns=c(1, 5:10), groupColumn=11) +
  theme(panel.grid.major.x=element_line(colour="grey70"))

这篇关于如何在R中绘制多个分类变量的平行坐标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆