使用基于值的虚拟对象创建新列 [英] Create new columns with dummies based on values
问题描述
我想根据单个现有列的值创建新列.它是事件数据(来自网站),因此值的数量不同.就像这样:
I want to make new columns, based on the values of a single existing column. It is event data (from a website), so the number of values are different. Just like this:
row Events
1 237,2,236,102,106,111,114,115,116,117,118,119,125
2 237,111,116
3 102,106,111,114,115
4 237,2,236,102,106,111,114,115,116,117,118,119,125, 126
根据不同的值,结果应该是虚拟数据.
The result should be dummy data, based on the different values.
row 237 2 236 102 106 111 114 115 116 117 118 119 125 126
1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
2 1 0 0 0 0 1 0 0 1 0 0 0 0 0
3 0 0 0 1 1 1 1 1 0 0 0 0 0 0
4 0 0 0 1 1 1 1 1 0 0 0 0 0 1
我尝试使用 tidyr 单独函数并结合函数createDummyFeatures"(MLR 包)来解决此问题.但是,我必须手动命名列(理想情况下,它应该采用值的名称,就像示例中一样).
I tried to solve this with the tidyr separate function, in combination with the function "createDummyFeatures" (MLR package). But, I had to name the columns manually (and ideally it should take the name of the value, just as in the example).
推荐答案
我们可以使用table
的方式,通过,
进行拆分,转换为data.frame
和 stack
We can use the table
approach after splitting by ,
and converting it to a data.frame
with stack
table(stack(setNames(strsplit(df1$Event, ","), df1$row))[2:1])
数据
df1 <- structure(list(row = 1:4,
Events = c("237,2,236,102,106,111,114,115,116,117,118,119,125",
"237,111,116", "102,106,111,114,115",
"237,2,236,102,106,111,114,115,116,117,118,119,125, 126"
)), .Names = c("row", "Events"), class = "data.frame", row.names = c(NA,
-4L))
这篇关于使用基于值的虚拟对象创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!