使用基于值的虚拟对象创建新列 [英] Create new columns with dummies based on values

查看:25
本文介绍了使用基于值的虚拟对象创建新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想根据单个现有列的值创建新列.它是事件数据(来自网站),因此值的数量不同.就像这样:

I want to make new columns, based on the values of a single existing column. It is event data (from a website), so the number of values are different. Just like this:

row    Events 
1       237,2,236,102,106,111,114,115,116,117,118,119,125
2       237,111,116
3       102,106,111,114,115
4       237,2,236,102,106,111,114,115,116,117,118,119,125, 126

根据不同的值,结果应该是虚拟数据.

The result should be dummy data, based on the different values.

row   237  2  236  102  106  111  114  115  116  117 118  119 125  126
1     1    1   1    1    1    1    1    1    1    1   1    1   1   0
2     1    0   0    0    0    1    0    0    1    0   0    0   0   0  
3     0    0   0    1    1    1    1    1    0    0   0    0   0   0
4     0    0   0    1    1    1    1    1    0    0   0    0   0   1

我尝试使用 tidyr 单独函数并结合函数createDummyFeatures"(MLR 包)来解决此问题.但是,我必须手动命名列(理想情况下,它应该采用值的名称,就像示例中一样).

I tried to solve this with the tidyr separate function, in combination with the function "createDummyFeatures" (MLR package). But, I had to name the columns manually (and ideally it should take the name of the value, just as in the example).

推荐答案

我们可以使用table的方式,通过,进行拆分,转换为data.framestack

We can use the table approach after splitting by , and converting it to a data.frame with stack

table(stack(setNames(strsplit(df1$Event, ","), df1$row))[2:1])

数据

df1 <- structure(list(row = 1:4, 
 Events = c("237,2,236,102,106,111,114,115,116,117,118,119,125", 
 "237,111,116", "102,106,111,114,115", 
 "237,2,236,102,106,111,114,115,116,117,118,119,125, 126"
)), .Names = c("row", "Events"), class = "data.frame", row.names = c(NA, 
 -4L))

这篇关于使用基于值的虚拟对象创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆