将字符串拆分为R中的新行 [英] Splitting a string into new rows in R

查看:119
本文介绍了将字符串拆分为R中的新行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集如下:

 国家地区分子项目代码
IND NA PB102 FR206985511
THAI AP PB103 BA-107603 / F000113361 / 107603
LUXE NA PB105 1012701 / SGP-1012701 / F041701000
IND AP PB106 AU206985211 / CA-F206985211
THAI HP PB107 F034702000 / 1010701 / SGP -1010701
BANG NA PB108 F000007970 / 25781/20009021

我要根据字符串进行拆分 ITEMCODE 列中的值 / ,并为每个条目创建一个新行。



例如,所需的输出将是:

 国家地区分子项目代码
IND NA PB102 FR206985511
THAI AP PB103 BA-107603
THAI AP PB103 F000113361
THAI AP PB103 107603
LUXE NA PB105 10127 01
LUXE NA PB105 SGP-1012701
LUXE NA PB105 F041701000
IND AP PB106 AU206985211
IND AP PB106 CA-F206985211
THAI HP PB107 F034702000
THAI HP PB107 1010701
THAI HP PB107 SGP-1010701
BANG NA PB108 F000007970
BANG NA PB108 25781
BANG NA PB108 20009021
pre>

我尝试了以下代码

  library(splitstackshape)
df2 = concat.split.multiple(df1,Plant.Item.Code,/,direction =long)

但收到错误

 错误:内存耗尽(限制达到了)

当我尝试 strsplit()以下错误消息。

  strsplit中的错误(df1 $ Plant.Item.Code,/):非字符参数

您的任何帮助将不胜感激。

解决方案

尝试使用 cSplit 函数使用@Anandas包)。请注意,将返回 data.table 对象,因此请确保已安装此软件包。您可以通过执行 setDF(df2) 来恢复 data.frame (如果您想) p>

 库(splitstackshape)
df2< - cSplit(df1,Item.Code,sep =/,方向=长)
df2
#国家地区分子项目编号
#1:IND NA PB102 FR206985511
#2:THAI AP PB103 BA-107603
#3:THAI AP PB103 F000113361
#4:THAI AP PB103 107603
#5:LUXE NA PB105 1012701
#6:LUXE NA PB105 SGP-1012701
#7:LUXE NA PB105 F041701000
#8:IND AP PB106 AU206985211
#9:IND AP PB106 CA-F206985211
#10:THAI HP PB107 F034702000
#11:THAI HP PB107 1010701
#12:THAI HP PB107 SGP-1010701
#13:BANG NA PB108 F000007970
#14:BANG NA PB108 25781
#15:BANG NA PB108 20009021


I have a data set like below:

Country Region    Molecule      Item Code   
    IND     NA       PB102      FR206985511 
   THAI     AP       PB103      BA-107603 / F000113361 / 107603
   LUXE     NA       PB105      1012701 / SGP-1012701 / F041701000
    IND     AP       PB106      AU206985211 / CA-F206985211
   THAI     HP       PB107      F034702000 / 1010701 / SGP-1010701
   BANG     NA       PB108      F000007970/25781/20009021

I want to split based the string values in ITEMCODE column on / and create a new row for each entry.

For instance, the desired output will be:

Country Region Molecule      Item.Code
    IND     NA    PB102    FR206985511
   THAI     AP    PB103      BA-107603
   THAI     AP    PB103     F000113361
   THAI     AP    PB103         107603
   LUXE     NA    PB105        1012701
   LUXE     NA    PB105    SGP-1012701
   LUXE     NA    PB105     F041701000
    IND     AP    PB106    AU206985211
    IND     AP    PB106  CA-F206985211
   THAI     HP    PB107     F034702000
   THAI     HP    PB107        1010701
   THAI     HP    PB107    SGP-1010701
   BANG     NA    PB108     F000007970
   BANG     NA    PB108          25781
   BANG     NA    PB108       20009021

I tried the below code

library(splitstackshape)
df2=concat.split.multiple(df1,"Plant.Item.Code","/", direction="long")

but got the Error

"Error: memory exhausted (limit reached?)"

When i tried strsplit() i got the below error message.

Error in strsplit(df1$Plant.Item.Code, "/") : non-character argument

Any help from you will be appreciated.

解决方案

Try the cSplit function (as you already using @Anandas package). Note that is will return a data.table object, so make sure you have this package installed. You can revert back to data.frame (if you want to) by doing something like setDF(df2)

library(splitstackshape)
df2 <- cSplit(df1, "Item.Code", sep = "/", direction = "long")
df2
#     Country Region Molecule      Item.Code
#  1:     IND     NA    PB102    FR206985511
#  2:    THAI     AP    PB103      BA-107603 
#  3:    THAI     AP    PB103     F000113361 
#  4:    THAI     AP    PB103         107603
#  5:    LUXE     NA    PB105        1012701 
#  6:    LUXE     NA    PB105    SGP-1012701 
#  7:    LUXE     NA    PB105     F041701000
#  8:     IND     AP    PB106    AU206985211 
#  9:     IND     AP    PB106  CA-F206985211
# 10:    THAI     HP    PB107     F034702000 
# 11:    THAI     HP    PB107        1010701 
# 12:    THAI     HP    PB107    SGP-1010701
# 13:    BANG     NA    PB108     F000007970
# 14:    BANG     NA    PB108          25781
# 15:    BANG     NA    PB108       20009021

这篇关于将字符串拆分为R中的新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆