R中多行的一种热编码 [英] One Hot Encoding From Multiple Rows in R

查看:62
本文介绍了R中多行的一种热编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我的数据具有以下格式:

Suppose I have data that has the following format:

ID VALUE
a  a
a  b
d  b
d  c

我想做的是对ID值进行一次热编码.当我使用model.matrix时,我得到:

What I would like to do is a one hot-encoding for the ID value. When I use model.matrix, I obtain:

model.matrix(~VALUE-1, df)

ID aVALUE bVALUE cVALUE
a  1      0      0
a  0      1      0
d  0      1      0
d  0      0      1

但是我想得到的是这样:

What I would like to get however is this:

ID aVALUE bVALUE cVALUE
a  1      1      0
d  0      1      1

这的另一部分是我的数据框大约有3000万行-因此,我正在寻找一种有效的方法来完成此操作.任何帮助或评论将不胜感激!

The other part to this is that my data frame is approximately 30 million rows - so I am looking for an efficient way to do this. Any help or comments would be greatly appreciated!

谢谢!

推荐答案

您可以使用table.

d <- table(df$ID, df$VALUE)
#    a b c
#  a 1 2 0
#  d 0 1 1

如果由于某些组合显示多次而必须执行1或0的值,则可以将这些情况转换为1:

If you have to enforce values of 1 or 0 because some combinations show up more than once, then you can convert those cases to 1:

d[d > 1L] <- 1
#    a b c
#  a 1 1 0
#  d 0 1 1

示例数据

df <- structure(list(ID = c("a", "a", "a", "d", "d"), VALUE = c("a", "b", "b", "b", "c")),
   .Names = c("ID", "VALUE"), class = "data.frame", row.names = c(NA, -5L))

这篇关于R中多行的一种热编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆