使用 dplyr 对连续变量进行分类 [英] Categorize continuous variable with dplyr

查看：31 发布时间：2021/12/23 12:21:12 r dplyr

本文介绍了使用 dplyr 对连续变量进行分类的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想基于连续数据创建一个具有 3 个任意类别的新变量.

I want to create a new variable with 3 arbitrary categories based on continuous data.

set.seed(123)
df <- data.frame(a = rnorm(100))

使用基础我会

df$category[df$a < 0.5] <- "low"
df$category[df$a > 0.5 & df$a < 0.6] <- "middle"
df$category[df$a > 0.6] <- "high"

是否有 dplyr，我猜 mutate()，对此的解决方案?

Is there a dplyr, I guess mutate(), solution for this?

此外，有没有办法计算类别而不是选择类别?IE.让 R 计算类别的中断位置.

Furthermore, is there a way to calculate the categories rather than choosing them? I.e. let R calculate where the breaks for the categories should be.

编辑

答案就在这个线程中，但是，它确实不涉及标签，这让我感到困惑(并可能使其他人感到困惑)，因此我认为这个问题是有目的的.

The answer is in this thread, however, it does not involve labelling, which confused me (and may confuse others) therefore I think this question serves a purpose.

推荐答案

要将数字转换为分类，请使用 cut.在您的特定情况下，您需要:

To convert from numeric to categorical, use cut. In your particular case, you want:

df$category <- cut(df$a, 
                   breaks=c(-Inf, 0.5, 0.6, Inf), 
                   labels=c("low","middle","high"))

或者，使用 dplyr:

library(dplyr)
res <- df %>% mutate(category=cut(a, breaks=c(-Inf, 0.5, 0.6, Inf), labels=c("low","middle","high")))
##               a category
##1   -0.560475647      low
##2   -0.230177489      low
##3    1.558708314     high
##4    0.070508391      low
##5    0.129287735      low
## ...
##35   0.821581082     high
##36   0.688640254     high
##37   0.553917654   middle
##38  -0.061911711      low
##39  -0.305962664      low
##40  -0.380471001      low
## ...
##96  -0.600259587      low
##97   2.187332993     high
##98   1.532610626     high
##99  -0.235700359      low
##100 -1.026420900      low

这篇关于使用 dplyr 对连续变量进行分类的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 dplyr 对连续变量进行分类 [英] Categorize continuous variable with dplyr

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 dplyr 对连续变量进行分类 [英] Categorize continuous variable with dplyr

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭