Dummify 字符列并找到唯一值 [英] Dummify character column and find unique values

查看:40
本文介绍了Dummify 字符列并找到唯一值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有以下结构的数据框

I have a dataframe with the following structure

test <- data.frame(col = c('a; ff; cc; rr;', 'rr; a; cc; e;'))

现在我想从中创建一个数据框,其中包含测试数据框中每个唯一值的命名列.唯一值是以;"结尾的值字符并以空格开头,不包括空格.然后对于列中的每一行,我希望用 1 或 0 填充虚拟列.如下所示

Now I want to create a dataframe from this which contains a named column for each of the unique values in the test dataframe. A unique value is a value ended by the ';' character and starting with a space, not including the space. Then for each of the rows in the column I wish to fill the dummy columns with either a 1 or a 0. As given below

data.frame(a = c(1,1), ff = c(1,0), cc = c(1,1), rr = c(1,0), e = c(0,1))

  a ff cc rr e
1 1  1  1  1 0
2 1  0  1  1 1

我尝试使用 for 循环和列中的唯一值创建 df,但它变得很混乱.我有一个包含列的唯一值的可用向量.问题是如何创建 1 和 0.我用 grep() 尝试了一些 mutate_all() 函数,但这不起作用.

I tried creating a df using for loops and the unique values in the column but it's getting to messy. I have a vector available containing the unique values of the column. The problem is how to create the ones and zeros. I tried some mutate_all() function with grep() but this did not work.

推荐答案

我会使用 qdapTools 包中的 splitstackshapemtabulate 来获得这是一个单衬,即

I'd use splitstackshape and mtabulate from qdapTools packages to get this as a one liner, i.e.

library(splitstackshape)
library(qdapTools)

mtabulate(as.data.frame(t(cSplit(test, 'col', sep = ';', 'wide'))))
#   a cc ff rr e
#V1 1  1  1  1 0
#V2 1  1  0  1 1

它也可以是完整的splitstackshape,正如@A5C1D2H2I1M1N2O1R2T1 在评论中提到的那样,

It can also be full splitstackshape as @A5C1D2H2I1M1N2O1R2T1 mentions in comments,

cSplit_e(test, "col", ";", mode = "binary", type = "character", fill = 0)

这篇关于Dummify 字符列并找到唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆