strsplit按行并按data.frame中的列分配结果 [英] strsplit by row and distribute results by column in data.frame

查看:321
本文介绍了strsplit按行并按data.frame中的列分配结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有data.frame

So I have the data.frame

dat = data.frame(x = c('Sir Lancelot the Brave', 'King Arthur',  
                       'The Black Knight', 'The Rabbit'), stringsAsFactors=F)

> dat
                       x
1 Sir Lancelot the Brave
2            King Arthur
3       The Black Knight
4             The Rabbit

我想把它转换成数据框架。

And I want to transform it into the data frame

> dat2
                       x    1            2       3      4
1 Sir Lancelot the Brave    Sir   Lancelot     the  Brave
2            King Arthur    King    Arthur
3       The Black Knight    The      Black  Knight 
4             The Rabbit    The     Rabbit

strsplit将数据作为列表返回

strsplit returns the data as a list

sbt <- strsplit(dat$x, " ")
> sbt
[[1]]
[1] "Sir"      "Lancelot" "the"      "Brave"   

[[2]]
[1] "King"   "Arthur"

[[3]]
[1] "The"    "Black"  "Knight"

[[4]]
[1] "The"    "Rabbit"

和as.data.table不创建NULL值

and as.data.table does not create NULL values where it should, but repeats values

> t(as.data.table(sbt))
   [,1]   [,2]       [,3]     [,4]    
V1 "Sir"  "Lancelot" "the"    "Brave" 
V2 "King" "Arthur"   "King"   "Arthur"
V3 "The"  "Black"    "Knight" "The"   
V4 "The"  "Rabbit"   "The"    "Rabbit"



我想我真的想要一个参数as.data.table(x,repeat = FALSE)

I guess I really would like an argument to as.data.table(x, repeat=FALSE), else how can I accomplish this job?

推荐答案

这里有一个选项。单一的复杂性是,你需要首先将每个向量转换为data.frame与一行,因为data.frames是 rbind.fill()期望。

Here's one option. The single complication is that you need to first convert each vector to a data.frame with one row, as data.frames are what rbind.fill() expects.

library(plyr)
rbind.fill(lapply(sbt, function(X) data.frame(t(X))))
#     X1       X2     X3    X4
# 1  Sir Lancelot    the Brave
# 2 King   Arthur   <NA>  <NA>
# 3  The    Black Knight  <NA>
# 4  The   Rabbit   <NA>  <NA>

我自己的倾向,只是使用base R,

My own inclination, though, would be to just use base R, like this:

n <- max(sapply(sbt, length))
l <- lapply(sbt, function(X) c(X, rep(NA, n - length(X))))
data.frame(t(do.call(cbind, l)))
#     X1       X2     X3    X4
# 1  Sir Lancelot    the Brave
# 2 King   Arthur   <NA>  <NA>
# 3  The    Black Knight  <NA>
# 4  The   Rabbit   <NA>  <NA>

这篇关于strsplit按行并按data.frame中的列分配结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆