如何通过do函数在某个列中拆分不同数量的字符串 [英] How to strsplit different number of strings in certain column by do function

查看:67
本文介绍了如何通过do函数在某个列中拆分不同数量的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当column的元素具有不同数量的字符串时,拆分列值有问题.我可以在plyr中做到这一点,例如:

I have a problem with split column value when element of column has different number of strings. I can do it in plyr e.g.:

library(plyr)
column <- c("jake", "jane jane","john john john")
df <- data.frame(1:3, name = column)
df$name <- as.character(df$name)
df2 <- ldply(strsplit(df$name, " "), rbind)
View(df2)

结果,我们得到的数据帧的列数与给定元素中最大maximum的数目有关.

As a result, we have data frame with number of column related to maximum number of stings in given element.

当我尝试在dplyr中执行此操作时,我使用了do函数:

When I try to do it in dplyr, I used do function:

library(dplyr)
df2 <- df %>%
  do(data.frame(strsplit(.$name, " ")))

但是我得到一个错误:

Error in data.frame("jake", c("jane", "jane"), c("john", "john", "john" : 
arguments imply differing number of rows: 1, 2, 3

在我看来,应该使用rbind函数,但我不知道在哪里.

It seems to me that it should be used rbind function but I do not know where.

推荐答案

您遇到了麻烦,因为strsplit()返回一个列表,然后我们需要将as.data.frame.list()应用于每个元素,以将其转换为与dplyr要求.即使那样,它仍然需要更多的工作才能获得可用的结果.长话短说,对于do()来说,它似乎不是合适的操作.

You're having troubles because strsplit() returns a list which we then need to apply as.data.frame.list() to each element to get it into the proper format that dplyr requires. Even then it would still require a bit more work to get usable results. Long story short, it doesn't seem like a suitable operation for do().

我认为您最好使用tidyr中的separate().它可以轻松地与dplyr函数和链一起使用.不清楚是否要保留第一列,因为df2ldply结果没有它,所以我将其保留.

I think you might be better off using separate() from tidyr. It can easily be used with dplyr functions and chains. It's not clear whether you want to keep the first column since your ldply result for df2 does not have it, so I left it off.

library(tidyr)
separate(df[-1], name, 1:3, " ", extra = "merge")
#      1    2    3
# 1 jake <NA> <NA>
# 2 jane jane <NA>
# 3 john john john

您也可以使用cSplit.由于它依赖data.table

You could also use cSplit. It is also very efficient since it relies on data.table

library(splitstackshape)
cSplit(df[-1], "name", " ")
#    name_1 name_2 name_3
# 1:   jake     NA     NA
# 2:   jane   jane     NA
# 3:   john   john   john

或更具体地说

setnames(df2 <- cSplit(df[-1], "name", " "), names(df2), as.character(1:3))
df2
#       1    2    3
# 1: jake   NA   NA
# 2: jane jane   NA
# 3: john john john

这篇关于如何通过do函数在某个列中拆分不同数量的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆