将文本列拆分为R中数据表中的不规则多个新列 [英] Splitting text column into ragged multiple new columns in a data table in R

查看:219
本文介绍了将文本列拆分为R中数据表中的不规则多个新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含20000行和一列的数据表。每列中的字符串具有不同的字数。我想分开的话,把它们放在一个新的列。我知道我如何做到这一点一个字:

  Data [,Word1:= as.character(lapply .character(Data $ complaint),split =),[,1))] 

Data 是我的数据表,投诉是列的名称)



显然,这并不高效,因为每行中的每个单元格都有不同的字数。



你能告诉我一个更有效的方法

从我的splitstackshape中检出 cSplit 包。它适用于 data.frame s或 data.table s(但始终返回



假设KFB的示例数据至少略微代表您的实际数据,您可以尝试:

 库(splitstackshape)
cSplit(df,x,)
#x_1 x_2 x_3 x_4
#1:这很有趣NA
#2:这实际上不是






另一个(火焰)选项是使用 stri_split_fixed simplify = TRUE stringi)(很快显然会进入splitstackshape代码):

  
#[1,]
#[1,]这个是b interestingNA
#[2,]Thisactuallyisnot


I have a data table containing 20000+ rows and one column. The string in each column has different number of words. I want to split the words and put each of them in a new column. I know how I can do it word by word:

Data [ , Word1 := as.character(lapply(strsplit(as.character(Data$complaint), split=" "), "[", 1))]

(Data is my data table and complaint is the name of the column)

Obviously, this is not efficient because each cell in each row has different number of words.

Could you please tell me about a more efficient way to do this?

解决方案

Check out cSplit from my "splitstackshape" package. It works on either data.frames or data.tables (but always returns a data.table).

Assuming KFB's sample data is at least slightly representative of your actual data, you can try:

library(splitstackshape)
cSplit(df, "x", " ")
#     x_1      x_2         x_3 x_4
# 1: This       is interesting  NA
# 2: This actually          is not


Another (blazing) option is to use stri_split_fixed with simplify = TRUE (from "stringi") (which is obviously deemed to enter the "splitstackshape" code soon):

library(stringi)
stri_split_fixed(df$x, " ", simplify = TRUE)
#      [,1]   [,2]       [,3]          [,4] 
# [1,] "This" "is"       "interesting" NA   
# [2,] "This" "actually" "is"          "not"

这篇关于将文本列拆分为R中数据表中的不规则多个新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆