如何strsplit不同数量的字符串在某些列通过do函数(dplyr) [英] How to strsplit different number of strings in certain column by do function (dplyr)

查看:115
本文介绍了如何strsplit不同数量的字符串在某些列通过do函数(dplyr)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当列的元素具有不同的字符串数量时,我有一个分裂列值的问题。我可以在plyr中执行,例如:

I have a problem with split column value when element of column has different number of strings. I can do it in plyr e.g.:

library(plyr)
column <- c("jake", "jane jane","john john john")
df <- data.frame(1:3, name = column)
df$name <- as.character(df$name)
df2 <- ldply(strsplit(df$name, " "), rbind)
View(df2)

因此,我们的数据框中列出了与给定元素中最大蜇数相关的列数。

As a result, we have data frame with number of column related to maximum number of stings in given element.

当我尝试做在dplyr中,我使用 do function:

When I try to do it in dplyr, I used do function:

library(dplyr)
df2 <- df %>%
  do(data.frame(strsplit(.$name, " ")))

但是我收到一个错误:

Error in data.frame("jake", c("jane", "jane"), c("john", "john", "john" : 
arguments imply differing number of rows: 1, 2, 3

在我看来,应该使用 rbind 函数,但我做不知道在哪里。

It seems to me that it should be used rbind function but I do not know where.

推荐答案

p>你有麻烦,因为 strsplit()返回一个列表,然后我们需要应用 as.data.frame.list()到每个元素,以使其成为 dplyr 需要的正确格式。即使这样,仍然需要更多的工作才能获得可用的结果。长篇小说,对于 do()来说似乎不合适。

You're having troubles because strsplit() returns a list which we then need to apply as.data.frame.list() to each element to get it into the proper format that dplyr requires. Even then it would still require a bit more work to get usable results. Long story short, it doesn't seem like a suitable operation for do().

我想你可能使用 separate() tidyr 更好。它可以很容易地用于 dplyr 函数和链。由于您的 ldply 结果 df2 没有它,您是否要保留第一列不清楚,所以我

I think you might be better off using separate() from tidyr. It can easily be used with dplyr functions and chains. It's not clear whether you want to keep the first column since your ldply result for df2 does not have it, so I left it off.

library(tidyr)
separate(df[-1], name, 1:3, " ", extra = "merge")
#      1    2    3
# 1 jake <NA> <NA>
# 2 jane jane <NA>
# 3 john john john

您还可以使用 cSplit 。它也非常有效,因为它依赖于 data.table

You could also use cSplit. It is also very efficient since it relies on data.table

library(splitstackshape)
cSplit(df[-1], "name", " ")
#    name_1 name_2 name_3
# 1:   jake     NA     NA
# 2:   jane   jane     NA
# 3:   john   john   john

或更具体地说

setnames(df2 <- cSplit(df[-1], "name", " "), names(df2), as.character(1:3))
df2
#       1    2    3
# 1: jake   NA   NA
# 2: jane jane   NA
# 3: john john john

这篇关于如何strsplit不同数量的字符串在某些列通过do函数(dplyr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆