拆分字符串列以创建新的二进制列 [英] Split string column to create new binary columns

查看:46
本文介绍了拆分字符串列以创建新的二进制列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据只有一列,我正在尝试使用行中每个/"之后的内容创建其他列.这是数据的前几行:

My data has one column and I am trying to create additional columns with what’s after each "/" in the rows. Here are the first few rows of the data:

> dput(mydata)
structure(list(ALL = structure(c(1L, 4L, 4L, 3L, 2L), .Label = c("/
ca/put/sent_1/fe.gr/eq2_on/eq2_off",
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off/cbr_LBL", "/ca/put/sent_1/fe.g
r/eq2_on/eq2_off/cni_at.p3x.4",
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off/hi.on/hi.ov"), class = "factor
")), .Names = "ALL", class = "data.frame", row.names = c(NA,
-5L))

结果应如下所示(数据框),如果变量出现在该行中,则在新列中为1",否则为0":

The result should look like this (data frame) with a "1" in the new column if the variable appears in the row and "0" if not:

> dput(Result)
structure(list(ALL = structure(c(1L, 4L, 5L, 3L, 2L), .Label = c("/ca
/put/sent_1/fe.gr/eq2_on/eq2_off",
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off/cbr_LBL", "/ca/put/sent_1/fe.gr/
eq2_on/eq2_off/cni_at.p3x.4",
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off/hi.on/hi.ov", "/ca/put/sent_1fe.
gr/eq2_on/eq2_off/hi.on/hi.ov"
), class = "factor"), ca = c(1L, 1L, 1L, 1L, 1L), put = c(1L,
1L, 1L, 1L, 1L), sent_1 = c(1L, 1L, 1L, 1L, 1L), fe.gr = c(1L,
1L, 1L, 1L, 1L), eq2_on = c(1L, 1L, 1L, 1L, 1L), eq2_off = c(1L,
1L, 1L, 1L, 1L), hi.on = c(0L, 1L, 1L, 0L, 0L), hi.ov = c(0L,
1L, 1L, 0L, 0L), cni_at.p3x.4 = c(0L, 0L, 0L, 1L, 0L), cbr_LBL = c(0L
,
0L, 0L, 0L, 1L)), .Names = c("ALL", "ca", "put", "sent_1", "fe.gr",
"eq2_on", "eq2_off", "hi.on", "hi.ov", "cni_at.p3x.4", "cbr_LBL"
), class = "data.frame", row.names = c(NA, -5L))

我尝试了很多函数,包括 strsplit 和 sapply:

I have tried many functions including strsplit and sapply:

sapply(strsplit(as.character(mydata$ALL), "\\/"), "[[", 2) #returns "ca"s only

sapply(strsplit(as.character(mydata$ALL), "\\/"), "[[", 3) #returns "put"s only

有数百万行,我非常感谢任何快速有效的方法.

There are millions of rows and I’d greatly appreciate anything that is quick and efficient.

推荐答案

使用我维护的 qdapTools 包中的 mtabuate:

Using mtabuate from the qdapTools package that I maintain:

library(qdapTools)
mtabulate(strsplit(as.character(dat[[1]]), "/"))

##   V1 ca cbr_LBL cni_at.p3x.4 eq2_off eq2_on fe.gr hi.on hi.ov put sent_1 sent_1fe.gr
## 1  1  1       0            0       1      1     1     0     0   1      1           0
## 2  1  1       0            0       1      1     1     1     1   1      1           0
## 3  1  1       0            0       1      1     0     1     1   1      0           1
## 4  1  1       0            1       1      1     1     0     0   1      1           0
## 5  1  1       1            0       1      1     1     0     0   1      1           0

这篇关于拆分字符串列以创建新的二进制列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆