将带有管道分隔数据的列转换为虚拟变量 [英] Convert column with pipe delimited data into dummy variables
问题描述
我有兴趣获取 data.frame 的一列,其中列中的值以竖线分隔,并从竖线分隔的值中创建虚拟变量.
I'm interested in taking a column of a data.frame where the values in the column are pipe delimited and creating dummy variables from the pipe-delimited values.
例如:
假设我们从
df = data.frame(a = c("Ben|Chris|Jim", "Ben|Greg|Jim|", "Jim|Steve|Ben"))
> df
a
1 Ben|Chris|Jim
2 Ben|Greg|Jim
3 Jim|Steve|Ben
我有兴趣结束:
df2 = data.frame(Ben = c(1, 1, 1), Chris = c(1, 0, 0), Jim = c(1, 1, 1), Greg = c(0, 1, 0),
Steve = c(0, 0, 1))
> df2
Ben Chris Jim Greg Steve
1 1 1 1 0 0
2 1 0 1 1 0
3 1 0 1 0 1
我事先不知道该领域内有多少潜在价值.在上面的示例中,变量a"可以包含 1 个值或 10 个值.假设它是一个合理的数字(即 <100 个可能的值).
I don't know in advance how many potential values there are within the field. In the example above, the variable "a" can include 1 value or 10 values. Assume it is a reasonable number (i.e., < 100 possible values).
有什么好的方法可以做到这一点?
Any good ways to do this?
推荐答案
另一种方法是使用 splitstackshape
包中的 cSplit_e
.
Another way is using cSplit_e
from splitstackshape
package.
按列 a
拆分数据帧,然后将其 fill
0 和 drop
原始列.
splitting the dataframe by column a
and fill
it by 0 and drop
the original column.
library(splitstackshape)
cSplit_e(df, "a", "|", type = "character", fill = 0, drop = T)
# a_Ben a_Chris a_Greg a_Jim a_Steve
#1 1 1 0 1 0
#2 1 0 1 1 0
#3 1 0 0 1 1
这篇关于将带有管道分隔数据的列转换为虚拟变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!