将cut()与group_by()一起使用 [英] Using cut() with group_by()

查看:68
本文介绍了将cut()与group_by()一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将一个连续变量归类为间隔,根据观察的组来改变切割值.已经有一个

I am trying to bin a continuous variable into intervals, varying the cut value based on the group of the observation. There has been a similar question asked previously, but it only dealt with a single column, while I was wanting to find a solution which could be generalised to work with he group_by() function in dplyr, which allows multiple columns to be selected for the grouping.

这是一个基本的示例数据集:

Here is a basic example dataset:

df <- data.frame(group = c(rep("Group 1", 10),
                           rep("Group 2", 10)),
                 subgroup = c(1,2),
                 value = 1:20)

创建:

     group subgroup value
1  Group 1        1     1
2  Group 1        2     2
3  Group 1        1     3
4  Group 1        2     4
5  Group 1        1     5
6  Group 1        2     6
7  Group 1        1     7
8  Group 1        2     8
9  Group 1        1     9
10 Group 1        2    10
11 Group 2        1    11
12 Group 2        2    12
13 Group 2        1    13
14 Group 2        2    14
15 Group 2        1    15
16 Group 2        2    16
17 Group 2        1    17
18 Group 2        2    18
19 Group 2        1    19
20 Group 2        2    20

出于这个问题的目的,假设我们想将组分为值 1 2 ,具体取决于值是大于还是小于组的平均值.分组应由 group subgroup 完成,预期输出为:

For the purpose of this question, lets assume that we want to split the groups into a value of 1 or 2, depending on whether the value is above or below the mean value of the group. The grouping should be done by group and subgroup, with an expected output of:

     group subgroup value cut
1  Group 1        1     1   1
2  Group 1        2     2   1
3  Group 1        1     3   1
4  Group 1        2     4   1
5  Group 1        1     5   1
6  Group 1        2     6   2
7  Group 1        1     7   2
8  Group 1        2     8   2
9  Group 1        1     9   2
10 Group 1        2    10   2
11 Group 2        1    11   1
12 Group 2        2    12   1
13 Group 2        1    13   1
14 Group 2        2    14   1
15 Group 2        1    15   1
16 Group 2        2    16   2
17 Group 2        1    17   2
18 Group 2        2    18   2
19 Group 2        1    19   2
20 Group 2        2    20   2

我希望输出类似以下内容的

I was hoping for an output along the lines of:

df %>%
  group_by(group, subgroup) %>%
  # INSERT MAGIC FUNCTION TO BIN DATA

推荐答案

如果您想使用 cut ,则可以这样操作:

If you want to use cut, you could do it this way:

df %>% 
  group_by(group, subgroup) %>% 
  mutate(bin = cut(value, breaks = c(-Inf, mean(value), Inf), labels = c(1,2)))

这篇关于将cut()与group_by()一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆