将一列拆分为多个二进制伪列 [英] Split a column into multiple binary dummy columns

查看：74 发布时间：2020/10/16 20:50:20 r dataframe

本文介绍了将一列拆分为多个二进制伪列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试将数据框中的单个字符变量拆分为多个因子变量。

I'm trying to split a single "character" variable in my dataframe into mutiple "factor" variables.

> sampledf=data.frame(vin=c('v1','v2','v3'),features=c('f1:f2:f3','f2:f4:f5','f1:f4:f5'))
> sampledf
  vin features
1  v1 f1:f2:f3
2  v2 f2:f4:f5
3  v3 f1:f4:f5

> desireddf=data.frame(vin=c('v1','v2','v3'),f1=c(1,0,1),f2=c(1,1,0),f3=c(1,0,0),f4=c(0,1,1),f5=c(0,1,1))
> desireddf
  vin f1 f2 f3 f4 f5
1  v1  1  1  1  0  0
2  v2  0  1  0  1  1
3  v3  1  0  0  1  1

我尝试使用 strsplit（）分隔功能列

strsplit(as.character(df$features), ";")

，但是没有运气来分解它们。

but have had no luck factorising them.

推荐答案

我们可以在拆分后使用 mtabulate 从 qdapTools （ strsplit（.. code>）功能列。


We can use mtabulate from qdapTools after splitting (strsplit(..) the 'features' column.
library(qdapTools)
cbind(sampledf[1],mtabulate(strsplit(as.character(sampledf$features), ':')))
#  vin f1 f2 f3 f4 f5
#1  v1  1  1  1  0  0
#2  v2  0  1  0  1  1
#3  v3  1  0  0  1  1

或者我们可以使用 cSplit_e 来自库（splitstackshape） 
Or we can use cSplit_e from library(splitstackshape)
library(splitstackshape)
df1 <- cSplit_e(sampledf, 'features', ':', type= 'character', fill=0, drop=TRUE)
names(df1) <-  sub('.*_', '', names(df1))

或者使用 base R 方法，我们分割和以前一样，使用 strsplit 中的 list 元素的名称，将其转换为键/ value列'data.frame'使用 stack ，获取表，转置和 cbind ，其第一列为 sampledf。

Or using base R methods, we split as before, set the names of the list elements from the strsplit with 'vin' column, convert to a key/value columns 'data.frame' using stack, get the table, transpose and cbind with the first column of 'sampledf'.

cbind(sampledf[1],  
 t(table(stack(setNames(strsplit(as.character(sampledf$features), ':'), 
              sampledf$vin)))))

这篇关于将一列拆分为多个二进制伪列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将一列拆分为多个二进制伪列 [英] Split a column into multiple binary dummy columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将一列拆分为多个二进制伪列 [英] Split a column into multiple binary dummy columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭