如何通过do函数在某个列中拆分不同数量的字符串 [英] How to strsplit different number of strings in certain column by do function
问题描述
当column的元素具有不同数量的字符串时,拆分列值有问题.我可以在plyr中做到这一点,例如:
I have a problem with split column value when element of column has different number of strings. I can do it in plyr e.g.:
library(plyr)
column <- c("jake", "jane jane","john john john")
df <- data.frame(1:3, name = column)
df$name <- as.character(df$name)
df2 <- ldply(strsplit(df$name, " "), rbind)
View(df2)
结果,我们得到的数据帧的列数与给定元素中最大maximum的数目有关.
As a result, we have data frame with number of column related to maximum number of stings in given element.
当我尝试在dplyr中执行此操作时,我使用了do
函数:
When I try to do it in dplyr, I used do
function:
library(dplyr)
df2 <- df %>%
do(data.frame(strsplit(.$name, " ")))
但是我得到一个错误:
Error in data.frame("jake", c("jane", "jane"), c("john", "john", "john" :
arguments imply differing number of rows: 1, 2, 3
在我看来,应该使用rbind
函数,但我不知道在哪里.
It seems to me that it should be used rbind
function but I do not know where.
推荐答案
您遇到了麻烦,因为strsplit()
返回一个列表,然后我们需要将as.data.frame.list()
应用于每个元素,以将其转换为与dplyr
要求.即使那样,它仍然需要更多的工作才能获得可用的结果.长话短说,对于do()
来说,它似乎不是合适的操作.
You're having troubles because strsplit()
returns a list which we then need to apply as.data.frame.list()
to each element to get it into the proper format that dplyr
requires. Even then it would still require a bit more work to get usable results. Long story short, it doesn't seem like a suitable operation for do()
.
我认为您最好使用tidyr
中的separate()
.它可以轻松地与dplyr
函数和链一起使用.不清楚是否要保留第一列,因为df2
的ldply
结果没有它,所以我将其保留.
I think you might be better off using separate()
from tidyr
. It can easily be used with dplyr
functions and chains. It's not clear whether you want to keep the first column since your ldply
result for df2
does not have it, so I left it off.
library(tidyr)
separate(df[-1], name, 1:3, " ", extra = "merge")
# 1 2 3
# 1 jake <NA> <NA>
# 2 jane jane <NA>
# 3 john john john
您也可以使用cSplit
.由于它依赖data.table
You could also use cSplit
. It is also very efficient since it relies on data.table
library(splitstackshape)
cSplit(df[-1], "name", " ")
# name_1 name_2 name_3
# 1: jake NA NA
# 2: jane jane NA
# 3: john john john
或更具体地说
setnames(df2 <- cSplit(df[-1], "name", " "), names(df2), as.character(1:3))
df2
# 1 2 3
# 1: jake NA NA
# 2: jane jane NA
# 3: john john john
这篇关于如何通过do函数在某个列中拆分不同数量的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!