从许多类别中生成虚拟变量 [英] Generating a dummy variable from lots of categories

查看:427
本文介绍了从许多类别中生成虚拟变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以...我有一个大型数据集,其中包含一个具有许多类别的变量。我想创建将这些类别中的一些归为一类的新变量。

So...I have a large data set with a variable that has many categories. I want to create new variables that group some of those categories into one.

我可以使用条件语句来完成此操作,但是鉴于类别的数量,这将使我永远一次只走一条线。另外,虽然我的原始变量是数字变量,但它们的值本身是随机的,因此我不能使用逻辑或范围语句。

I could do that with a conditional statement, but given the amount of categories it would take me forever to go one line at the time. Also, while my original variable is numeric, the values themselves are random so I can´t use logical or range statements.

如何基于许多变量来创建此条件变量特殊的价值?

How do I create this conditional variable based on many particular values?

我尝试了以下操作,但没有成功。下面是我想归为一类的不同类别的示例。

I tried the following, but without success. Below is an example of the different categories I want to group into one.

classes <- c(549,162,210,222,44,96,62,208,525,202,149,442,427,
      564,423,106,422,546,205,560,127,536,34,261,568,
      366,524,401,548,95,156,8,528, 430,527,556,203,554,523,
      501,530,55,252,585,19,540,71,204,502,504, 196,436,48,
      102,526,201,521,23,558,552,118,416,117,216,510,494,
      516,544,518)

因此,这对我来说似乎很直观,但行不通。

So this seemed pretty intuitive to me, but it doesn´t work.

df$chem<- cbind(ifelse(df$class == classes ,1,0))

不用说,我是个初学者,这可能并不难,但我一直在寻找解决方案特殊的问题,我似乎找不到。我想念什么?谢谢!

Needless to say I´m a beginner, and this is probably not so hard to do, but I´ve been looking for a solution to this particular problem and I can´t seem to find it. What am I missing? Thanks!

推荐答案

您要的是%in%而不是 ==

例如

df$chem <- cbind(ifelse(df$class %in% classes ,1,0))

或使用逻辑到数字的转换

or using the logical to numeric conversion

df$chem <-  as.numeric(df$class %in% classes)

如果您要为 all 类别使用单独的虚拟变量在 df $ class 中,则可以在软件包 nnet <中使用 class.ind 函数。 / code>(作为推荐包装提供)

if you want individual dummy variables for all the categories in df$class then you can use the class.ind function in the package nnet (which is shipped as a recommended package)

library(nnet)

class_ind <- class.ind(df$class)
# add if you want to combine with the original
df_ind <- do.call(cbind, list(df, class.ind(df$class))

这篇关于从许多类别中生成虚拟变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆