在R中,如何折叠类别或对变量重新分类? [英] In R, how to collapse categories or recategorize variables?

查看:341
本文介绍了在R中,如何折叠类别或对变量重新分类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我确信这是一个非常基本的问题:

I am sure this is a very basic question:

在RI中,有600,000个类别变量-每个变量都分类为 0, 1或 2

In R I have 600,000 categorical variables - each of which is classified as "0", "1", or "2"

我想做的是折叠 1和 2并单独保留 0,以便在将 0重新分类后= 0; 1 = 1和 2 = 1 ---最后,我只希望将 0和 1作为每个变量的类别。

What I would like to do is collapse "1" and "2" and leave "0" by itself, such that after re-categorizing "0" = "0"; "1" = "1" and "2" = "1" --- in the end I only want "0" and "1" as categories for each of the variables.

此外,如果可能的话,我宁愿不要创建600,000个新变量,如果我可以用新的值替换现有变量,那将是很好的选择!

Also, if possible I would rather not create 600,000 new variables, if I can replace the existing variables with the new values that would be great!

谢谢!

推荐答案

car 中有函数 recode (应用回归的同伴):

There is a function recode in package car (Companion to Applied Regression):

require("car")    
recode(x, "c('1','2')='1'; else='0'")

或针对您在普通R中的情况:

or for your case in plain R:

> x <- factor(sample(c("0","1","2"), 10, replace=TRUE))
> x
 [1] 1 1 1 0 1 0 2 0 1 0
Levels: 0 1 2
> factor(pmin(as.numeric(x), 2), labels=c("0","1"))
 [1] 1 1 1 0 1 0 1 0 1 0
Levels: 0 1

更新:重新编码数据框的所有分类列 tmp 您可以使用以下

Update: To recode all categorical columns of a data frame tmp you can use the following

recode_fun <- function(x) factor(pmin(as.numeric(x), 2), labels=c("0","1"))
require("plyr")
catcolwise(recode_fun)(tmp)

这篇关于在R中,如何折叠类别或对变量重新分类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆