按组突变不同的切点 [英] Mutate with different cut points by group

查看:57
本文介绍了按组突变不同的切点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为不同的组剪切具有不同切点的数字变量.

I would like to cut a numeric variable with different cut points for different groups.

我尝试合并到一个列表中,但是我怀疑我需要某种功能和循环的组合.可以在[此处] [1]找到类似的重新编码示例,但cut不会接受列表.

I have tried combining into a list but I suspect I need some combination of function and loop. A similar example for recode can be found [here][1], but cut will not accept lists.

有什么建议吗?

可以扩展为宽格式,但是我想知道如何以长格式进行.

Spreading into wide format is possible but I want to know how to do this in long format.

```Cutpoints2<-c(0,10,20,50,100,9999)
```Cutpoints1<-c(0,1,10,100,9999)
```Cutpoints<-list(Cutpoints1, Cutpoints2) 
```Df2<-Df1 %>%
```group_by(group) %>%
```mutate(varcat=cut(var,Cutpoints)) 


  [1]: http://Www.stackoverflow.com/questions/56636417

推荐答案

如果您指的是基数R cut (在上下文中有意义),则可以使用两种不同的方法,具体取决于您要如何对组变量进行编码以及要进行多少次键入和转换.(如果您没有向我们展示您的数据,那么很难说出最好的方法.)

If you mean the base R cut (which makes sense in context), you can use a couple of different methods, depending on how your group variable is encoded and how much typing versus transforming you want to do. (It's hard to tell what will be best given you haven't shown us what your data looks like.)

library(tidyverse)

Cutpoints2<-c(0,10,20,50,100,9999)
Cutpoints1<-c(0,1,10,100,9999)

test = tibble(
  numbers = seq(from = 0, 99.5, by = 0.5),
  group = rep(c(1,2),length(numbers)/2)
) 
## Method 1: ifelse
test %>% 
  group_by(group) %>% 
  mutate(cut_group = 
    ifelse(group == 1, 
           cut(numbers, Cutpoints1) %>% as.character,
           cut(numbers, Cutpoints2) %>% as.character)
  )
## Method 2: get
test %>% 
  group_by(group) %>% 
  mutate(cut_group = 
            cut(numbers,
                get(paste0("Cutpoints",group))) %>% as.character
  )

如果只有几个切点,则 ifelse 方法是调用 cut 的简单方法,以对每个切点矢量的手动引用为行添加注释.您必须调用 as.character ,因为剪切产生的因素不能很好地发挥作用.(也许在函数中也可以摆脱它,但是 as.character 在任何情况下都可以使用.)但是,如果您有很多切入点,则可以使用 get 来获取作为字符串传递的变量的值,这是我在这里使用 paste0 构建的-如果有的话,可以使用 stringr :: str_replace_all 它们编码为"group1"之类的东西.

If you only have a few cutpoints, then the ifelse approach is a simple way to call cut to annotated your rows with manual references to each cutpoint vector. You have to call as.character because the factors produced by cut don't play well. (There may be a way to get rid of it within the function, too, but as.character will work in any case.) If, however, you have a lot of cutpoints, you can use get to grab the value of variables passed as strings, which I'm constructing with paste0 here--you could stringr::str_replace_all if you have them encoded as "group1" or something.

无论哪种情况,您都可以使用我创建的测试提示得到此结果:

In either case, you'll get this result using the test tibble I created:

# A tibble: 200 x 3
# Groups:   group [2]
   numbers group cut_group
     <dbl> <dbl> <chr>    
 1     0       1 NA       
 2     0.5     2 (0,10]   
 3     1       1 (0,1]    
 4     1.5     2 (0,10]   
 5     2       1 (1,10]   
 6     2.5     2 (0,10]   
 7     3       1 (1,10]   
 8     3.5     2 (0,10]   
 9     4       1 (1,10]   
10     4.5     2 (0,10]   
# … with 190 more rows

如果矢量列表中已经包含所有切点,则只需使用 Cutpoints [[paste0("Cutpoints",group)]] 调用它们,而不使用 get.否则,没有必要将它们包装在列表中.

If you already have all of the cutpoints in a list of vectors, you would just call them with Cutpoints[[paste0("Cutpoints",group)]] instead of using get. Otherwise, it's not necessary to wrap them in a list.

这篇关于按组突变不同的切点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆