R中多行的一种热编码 [英] One Hot Encoding From Multiple Rows in R
本文介绍了R中多行的一种热编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
假设我的数据具有以下格式:
Suppose I have data that has the following format:
ID VALUE
a a
a b
d b
d c
我想做的是对ID值进行一次热编码.当我使用model.matrix
时,我得到:
What I would like to do is a one hot-encoding for the ID value. When I use model.matrix
, I obtain:
model.matrix(~VALUE-1, df)
ID aVALUE bVALUE cVALUE
a 1 0 0
a 0 1 0
d 0 1 0
d 0 0 1
但是我想得到的是这样:
What I would like to get however is this:
ID aVALUE bVALUE cVALUE
a 1 1 0
d 0 1 1
这的另一部分是我的数据框大约有3000万行-因此,我正在寻找一种有效的方法来完成此操作.任何帮助或评论将不胜感激!
The other part to this is that my data frame is approximately 30 million rows - so I am looking for an efficient way to do this. Any help or comments would be greatly appreciated!
谢谢!
推荐答案
您可以使用table
.
d <- table(df$ID, df$VALUE)
# a b c
# a 1 2 0
# d 0 1 1
如果由于某些组合显示多次而必须执行1或0的值,则可以将这些情况转换为1:
If you have to enforce values of 1 or 0 because some combinations show up more than once, then you can convert those cases to 1:
d[d > 1L] <- 1
# a b c
# a 1 1 0
# d 0 1 1
示例数据
df <- structure(list(ID = c("a", "a", "a", "d", "d"), VALUE = c("a", "b", "b", "b", "c")),
.Names = c("ID", "VALUE"), class = "data.frame", row.names = c(NA, -5L))
这篇关于R中多行的一种热编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文