ggplot的数据帧变量顺序 [英] Data frame variable order for ggplot

查看:176
本文介绍了ggplot的数据帧变量顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对此表示怀疑,但我确实看过其他问题,并没有找到一个似乎适用于我的例子。



我想使ggplot的y轴上的字符标签基于数据框的其他列进行排序。我相信这是在使用ggplot之前正确设置因素和级别的问题,但我在如何做到这一点方面遇到了困难。



这里是一个简化的例子(可能看起来没有意义):

pre $ library $ t
library (ggplot2)

set.seed(1)
num_rows< - 12
sample_names< - do.call(paste0,replicate(5,sample(letters,num_rows, TRUE),FALSE))
df1 < - data.frame(region = sample(c(N,S,E,W),num_rows,replace = TRUE),
sub_region = sample(c(High,Medium,Low),num_rows,replace = TRUE),
my_order = seq(1,num_rows),
my_name = sample_names,
var_1 = sample(100,num_rows,replace = TRUE))

#try使用排列
df2 < - df1%>>%排列(因子(df1 $ region,levels = c(N,E,S,W)),
因子(df1 $ sub_region,lev els = c(High,Medium,Low)))
df2%>%ggplot()+ geom_point(aes(x = var_1,y = my_name,color = sub_region))

#try使用次序
df3 < - df1
df3 $ region < - factor(df1 $ region,levels = c(N,E,S ,W))
df3 $ sub_region < - factor(df1 $ sub_region,levels = c(High,Medium,Low))
df4 < - df3 [order (df1 $ region,df1 $ sub_region,df1 $ my_order),]
df4%>%ggplot()+ geom_point(aes(x = var_1,y = my_name,color = sub_region))

我希望my_names和相应的值按地区排序,然后是subregion,然后是my_order(作为决胜者)(但现在至少没有显示图表中的任何一个),但my_name似乎继续以字母顺序显示,无论我尝试使用排列(来自dplyr)还是顺序。我意识到我没有为my_order列添加任何代码,但由于排序的第一个级别无效,因此我认为我会坚持这一点。



qymni
fswvl
jjkcs
ouasm
xziqg
fqvar



显然,我我做错了什么,但我不确定是什么。我将不胜感激任何帮助。此外,我是否正确,一旦我有这个工作正常,使用group_by并从dplyr总结将保留my_names的顺序?

解决方案

首先,您可以在原始数据框中为 region 等列设置因子级别的顺序。然后,你不会最终得到相同数据的所有这些不同的稍微修改版本。然后根据需要对数据框进行排序,并使用 forcats :: fct_inorder 重新分配 my_name 的因子级别它们在数据框中的当前顺序:

$ $ $ $ b $ library $($) )

set.seed(1)
num_rows< - 12
sample_names< - do.call(paste0,replicate(5,sample(letters,num_rows,TRUE) ,FALSE))
df1< - data.frame(region = sample(c(N,S,E,W),num_rows,replace = TRUE),
sub_region = sample(c(High,Medium,Low),num_rows,replace = TRUE),
my_order = seq(1,num_rows),
my_name = sample_names,
)var_1 = sample(100,num_rows,replace = TRUE))

df1 $ region < - factor(df1 $ region,levels = c(N,E,S, W))
df1 $ sub_region < - factor(df1 $ sub_region,levels = c(High,Medium,Low))
df1 < - df1 [order( df1 $ region,df1 $ sub_region,df1 $ my_order,de (b)根据当前订单订购my_name等级
df1 $ my_name = fct_inorder(df1 $ my_name)
df1%>%ggplot()+ geom_point(aes(x =注解,我必须使用<$ c <= $ $ c $>下降= TRUE 在 order()调用中可以让订单从上到下进行。



对于类似于 my_name 的分类变量,它是决定订单 ggplot 图的订单他们在,而不是他们在数据框中的当前顺序,这是你在示例代码中改变的。这使得当您需要控制图表中的订单时, forcats 中的工具非常有用。


I ask this with some trepidation, but I really have looked at other questions and haven't found an example that seems to work for me.

I would like to have the character labels on the y-axis of a ggplot sorted based on other columns of the data frame. I believe that this is a matter of setting up factors and levels correctly, prior to using ggplot, but I am having difficulty with the specifics of how to do this.

Here is a simplified example (to the point of potentially not seeming to make sense):

library(tidyverse)
library(ggplot2)

set.seed(1)
num_rows <- 12
sample_names <- do.call(paste0, replicate(5, sample(letters, num_rows, TRUE), FALSE))
df1 <- data.frame(region=sample(c("N", "S", "E", "W"), num_rows, replace = TRUE), 
                  sub_region=sample(c("High", "Medium", "Low"), num_rows, replace = TRUE),
                  my_order = seq(1,num_rows), 
                  my_name = sample_names,
                  var_1 = sample(100, num_rows, replace = TRUE))

#try using arrange
df2 <- df1 %>% arrange(factor(df1$region, levels = c("N","E","S","W")), 
                       factor(df1$sub_region, levels = c("High","Medium","Low")))
df2 %>% ggplot() + geom_point(aes(x = var_1, y = my_name, color=sub_region))

#try using order
df3 <- df1
df3$region <- factor(df1$region, levels = c("N","E","S","W"))
df3$sub_region <- factor(df1$sub_region, levels = c("High","Medium","Low"))
df4 <- df3[order(df1$region, df1$sub_region, df1$my_order),]
df4 %>% ggplot() + geom_point(aes(x = var_1, y = my_name, color=sub_region))

I'm hoping to have my_names and the corresponding values sorted by region, then subregion, then my_order (as a tie-breaker) in the plot (without, for now at least, showing any of these in the chart), but my_name seems to continue to appear in alphabetical order, whether I try using arrange (from dplyr) or order. I realize that I haven't put in any code for the my_order column, but since the first to levels of sort aren't working, I thought I would hold off on that.

I am looking for the y-axis to be in this order (from the top, down):

qymni fswvl jjkcs ouasm xziqg fqvar

etc.

Clearly, I'm doing something wrong, but I'm not sure what. I would appreciate any help. Also, am I correct that once I have this working correctly, using group_by and summarize from dplyr will preserve the order of my_names?

解决方案

First off, you can set the order of factor levels for columns like region in your original dataframe. Then you don't end up with all these different slightly modified versions of the same data. Then sort the dataframe how you want it, and use forcats::fct_inorder to reassign the factor levels for my_name based on their current order in the dataframe:

library(tidyverse)
library(ggplot2)
library(forcats)

set.seed(1)
num_rows <- 12
sample_names <- do.call(paste0, replicate(5, sample(letters, num_rows, TRUE), FALSE))
df1 <- data.frame(region=sample(c("N", "S", "E", "W"), num_rows, replace = TRUE), 
                  sub_region=sample(c("High", "Medium", "Low"), num_rows, replace = TRUE),
                  my_order = seq(1,num_rows), 
                  my_name = sample_names,
                  var_1 = sample(100, num_rows, replace = TRUE))

df1$region <- factor(df1$region, levels = c("N","E","S","W"))
df1$sub_region <- factor(df1$sub_region, levels = c("High","Medium","Low"))
df1 <- df1[order(df1$region, df1$sub_region, df1$my_order, decreasing = TRUE), ]
# Order my_name levels based on current order
df1$my_name = fct_inorder(df1$my_name)
df1 %>% ggplot() + geom_point(aes( x = var_1, y = my_name, color=sub_region))

Note that I had to use decreasing = TRUE in the order() call to get the order going top to bottom.

For categorical variables like my_name, it's the order of factor levels that determines the order ggplot plots them in, not their current order in the dataframe which is what you were changing in your example code. This makes the tools in forcats very useful when you need to control the order in a plot.

这篇关于ggplot的数据帧变量顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆