将数据分成R组 [英] split data into groups in R

查看:161
本文介绍了将数据分成R组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框架如下所示:

My data frame looks like this:

plant   distance
one 0
one 1
one 2
one 3
one 4
one 5
one 6
one 7
one 8
one 9
one 9.9
two 0
two 1
two 2
two 3
two 4
two 5
two 6
two 7
two 8
two 9
two 9.5

我想每个级别按区间(例如,间隔= 3)将每个级别的分割距离分组,并计算每组的百分比。最后,绘制每个类别的每个级别的百分比类似如下:

I want to split distance of each level into groups by interval(for instance,interval=3), and compute percentage of each group. Finally, plot the percentages of each level of each group similar like this:

我的代码:

library(ggplot2)
library(dplyr)

dat <- data %>% 
  mutate(group = factor(cut(distance, seq(0, max(distance), 3), F))) %>% 
  group_by(plant, group) %>% 
  summarise(percentage = n()) %>% 
  mutate(percentage = percentage / sum(percentage))
p <- ggplot(dat, aes(x = plant, y = percentage, fill = group)) + 
  geom_bar(stat = "identity", position = "stack")+
  scale_y_continuous(labels=percent)
p

但是我的情节如下所示: group 4 缺少。

But my plot is shown below: the group 4 was missing.

我发现 dat 是错误的,组4 NA

And I found that the dat was wrong, the group 4 was NA.

可能的原因是 group 4 的长度小于 interval = 3 ,所以我的问题是如何解决?谢谢你提前!

The likely reason is that the length of group 4 was less than the interval=3, so my question is how to fix it? Thank you in advance!

推荐答案

我已经解决了问题。原因是 cut ,seq(0,max(distance),3),F)不包括最大和最小值。

I have solved the problem.The reason is that the cut(distance, seq(0, max(distance), 3), F) did not include the maximum and minimum values.

这是我的解决方案:

dat <- my_data %>% 
  mutate(group = factor(cut(distance, seq(from = min(distance), by = 3,   length.out = n()/ 3 + 1),  include.lowest = TRUE)))  %>% 
  count(plant, group) %>%
  group_by(plant) %>%
  mutate(percentage = n / sum(n))

这篇关于将数据分成R组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆