将数据分成R组 [英] split data into groups in R
问题描述
我的数据框架如下所示:
My data frame looks like this:
plant distance
one 0
one 1
one 2
one 3
one 4
one 5
one 6
one 7
one 8
one 9
one 9.9
two 0
two 1
two 2
two 3
two 4
two 5
two 6
two 7
two 8
two 9
two 9.5
我想每个级别按区间(例如,间隔= 3)将每个级别的分割距离分组,并计算每组的百分比。最后,绘制每个类别的每个级别的百分比类似如下:
I want to split distance of each level into groups by interval(for instance,interval=3), and compute percentage of each group. Finally, plot the percentages of each level of each group similar like this:
我的代码:
library(ggplot2)
library(dplyr)
dat <- data %>%
mutate(group = factor(cut(distance, seq(0, max(distance), 3), F))) %>%
group_by(plant, group) %>%
summarise(percentage = n()) %>%
mutate(percentage = percentage / sum(percentage))
p <- ggplot(dat, aes(x = plant, y = percentage, fill = group)) +
geom_bar(stat = "identity", position = "stack")+
scale_y_continuous(labels=percent)
p
但是我的情节如下所示: group 4
缺少。
But my plot is shown below: the group 4
was missing.
我发现 dat
是错误的,组4
是 NA
。
And I found that the dat
was wrong, the group 4
was NA
.
可能的原因是 group 4
的长度小于 interval = 3
,所以我的问题是如何解决?谢谢你提前!
The likely reason is that the length of group 4
was less than the interval=3
, so my question is how to fix it? Thank you in advance!
推荐答案
我已经解决了问题。原因是 cut ,seq(0,max(distance),3),F)
不包括最大和最小值。
I have solved the problem.The reason is that the cut(distance, seq(0, max(distance), 3), F)
did not include the maximum and minimum values.
这是我的解决方案:
dat <- my_data %>%
mutate(group = factor(cut(distance, seq(from = min(distance), by = 3, length.out = n()/ 3 + 1), include.lowest = TRUE))) %>%
count(plant, group) %>%
group_by(plant) %>%
mutate(percentage = n / sum(n))
这篇关于将数据分成R组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!