根据另一列创建具有分组值的列 [英] Create column with grouped values based on another column

查看:152
本文介绍了根据另一列创建具有分组值的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我确信已经被问过了,但我不知道要搜索什么,所以我提前道歉。

I'm sure this has been asked before, but I don't know what to search for, so I apologise in advance.

我有以下数据框:

grades <- data.frame(a = 1:40, b = sample(45:100, 40))

使用deplyr,我想创建一个新变量,指示学生收到的成绩,基于以下条件:90-100 =优秀,80-90 =非常好等。

Using deplyr, I want to create a new variable that indicates the grade the student received, based on the following criteria: 90-100 = excellent, 80-90 = very good, etc.

我以为可以用mutate()中的ifelse()函数来获得这个结果:

I thought I could use the following to get that result with nestling ifelse() inside of mutate():

grades %>%
mutate(ifelse(b >= 90, "excellent"), 
       ifelse(b >= 80 & b < 90, "very_good"),
       ifelse(b >= 70 & b < 80, "fair"),
       ifelse(b >= 60 & b < 70, "poor", "fail"))

这不起作用,因为我收到错误消息参数no缺失,没有默认)。我以为不将是最后的失败,但显然我的语法错了。

This doesn't work, as I get the error message "argument no is missing, with no default"). I thought the "no" would be the "fail" at the end, but obviously I'm getting the syntax wrong.

如果我先单独过滤原始数据,然后再调用ifelse,我可以得到这个,如下所示:

I can get this to get if I first filter the original data individually, and then call ifelse, as follows:

a <- grades %>%
     filter( b >= 90) %>%
     mutate(final = ifelse(b >= 90, "excellent"))

和rbind a,b,c等。这不是我想做的,但是我想了解ifelse()的语法。我猜测后者的作品是因为没有任何不符合标准的值,但是当有多个ifelse时,我仍然无法弄清楚它的工作原理。

and the rbind a, b, c, etc. Obviously,this isn't how I want to do it, but I wanted to understand the syntax of ifelse(). I'm guessing the latter works because there aren't any values that don't fill the criteria, but I still can't figure out how to get it to work when there is more than one ifelse.

推荐答案

使用级别和标签定义向量,然后使用 cut b 列:

Define vectors with the levels and labels and then use cut on the b column:

levels <- c(-Inf, 60, 70, 80, 90, Inf)
labels <- c("Fail", "Poor", "fair", "very good", "excellent")
grades %>% mutate(x = cut(b, levels, labels = labels))
    a   b         x
1   1  66      Poor
2   2  78      fair
3   3  97 excellent
4   4  46      Fail
5   5  89 very good
6   6  57      Fail
7   7  80      fair
8   8  98 excellent
9   9 100 excellent
10 10  93 excellent
11 11  59      Fail
12 12  51      Fail
13 13  69      Poor
14 14  75      fair
15 15  72      fair
16 16  48      Fail
17 17  74      fair
18 18  54      Fail
19 19  62      Poor
20 20  64      Poor
21 21  88 very good
22 22  70      Poor
23 23  85 very good
24 24  58      Fail
25 25  95 excellent
26 26  56      Fail
27 27  65      Poor
28 28  68      Poor
29 29  91 excellent
30 30  76      fair
31 31  82 very good
32 32  55      Fail
33 33  96 excellent
34 34  83 very good
35 35  61      Poor
36 36  60      Fail
37 37  77      fair
38 38  47      Fail
39 39  73      fair
40 40  71      fair

或使用data.table:

Or using data.table:

library(data.table)
setDT(grades)[, x := cut(b, levels, labels)]

或者只是在基础R中:

grades$x <- cut(grades$b, levels, labels)



注意



在仔细观察您的初始方法后,我注意到您需要包含 right = FALSE cut 调用,因为例如90分应该是优秀,不只是非常好 。因此,它用于定义间隔应在何处关闭(左或右),默认位于右侧,与OP的初始方法略有不同。所以在dplyr中,它将是:

Note

After taking another close look at your initial approach, I noticed that you would need to include right = FALSE in the cut call, because for example, 90 points should be "excellent", not just "very good". So it is used to define where the interval should be closed (left or right) and the default is on the right, which is slightly different from OP's initial approach. So in dplyr, it would then be:

grades %>% mutate(x = cut(b, levels, labels, right = FALSE))

相应地在其他选项中。

这篇关于根据另一列创建具有分组值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆