将字符串拆分为R中的新行 [英] Splitting a string into new rows in R
问题描述
我有一个数据集如下:
国家地区分子项目代码
IND NA PB102 FR206985511
THAI AP PB103 BA-107603 / F000113361 / 107603
LUXE NA PB105 1012701 / SGP-1012701 / F041701000
IND AP PB106 AU206985211 / CA-F206985211
THAI HP PB107 F034702000 / 1010701 / SGP -1010701
BANG NA PB108 F000007970 / 25781/20009021
我要根据字符串进行拆分 ITEMCODE
列中的值 /
,并为每个条目创建一个新行。
例如,所需的输出将是:
国家地区分子项目代码
pre>
IND NA PB102 FR206985511
THAI AP PB103 BA-107603
THAI AP PB103 F000113361
THAI AP PB103 107603
LUXE NA PB105 10127 01
LUXE NA PB105 SGP-1012701
LUXE NA PB105 F041701000
IND AP PB106 AU206985211
IND AP PB106 CA-F206985211
THAI HP PB107 F034702000
THAI HP PB107 1010701
THAI HP PB107 SGP-1010701
BANG NA PB108 F000007970
BANG NA PB108 25781
BANG NA PB108 20009021
我尝试了以下代码
library(splitstackshape)
df2 = concat.split.multiple(df1,Plant.Item.Code,/,direction =long)
但收到错误
错误:内存耗尽(限制达到了)
当我尝试
strsplit()
以下错误消息。strsplit中的错误(df1 $ Plant.Item.Code,/):非字符参数
您的任何帮助将不胜感激。
解决方案尝试使用
cSplit
函数使用@Anandas包)。请注意,将返回data.table
对象,因此请确保已安装此软件包。您可以通过执行setDF(df2)
>来恢复data.frame
(如果您想) p>
库(splitstackshape)
df2< - cSplit(df1,Item.Code,sep =/,方向=长)
df2
#国家地区分子项目编号
#1:IND NA PB102 FR206985511
#2:THAI AP PB103 BA-107603
#3:THAI AP PB103 F000113361
#4:THAI AP PB103 107603
#5:LUXE NA PB105 1012701
#6:LUXE NA PB105 SGP-1012701
#7:LUXE NA PB105 F041701000
#8:IND AP PB106 AU206985211
#9:IND AP PB106 CA-F206985211
#10:THAI HP PB107 F034702000
#11:THAI HP PB107 1010701
#12:THAI HP PB107 SGP-1010701
#13:BANG NA PB108 F000007970
#14:BANG NA PB108 25781
#15:BANG NA PB108 20009021
I have a data set like below:
Country Region Molecule Item Code IND NA PB102 FR206985511 THAI AP PB103 BA-107603 / F000113361 / 107603 LUXE NA PB105 1012701 / SGP-1012701 / F041701000 IND AP PB106 AU206985211 / CA-F206985211 THAI HP PB107 F034702000 / 1010701 / SGP-1010701 BANG NA PB108 F000007970/25781/20009021
I want to split based the string values in
ITEMCODE
column on/
and create a new row for each entry.For instance, the desired output will be:
Country Region Molecule Item.Code IND NA PB102 FR206985511 THAI AP PB103 BA-107603 THAI AP PB103 F000113361 THAI AP PB103 107603 LUXE NA PB105 1012701 LUXE NA PB105 SGP-1012701 LUXE NA PB105 F041701000 IND AP PB106 AU206985211 IND AP PB106 CA-F206985211 THAI HP PB107 F034702000 THAI HP PB107 1010701 THAI HP PB107 SGP-1010701 BANG NA PB108 F000007970 BANG NA PB108 25781 BANG NA PB108 20009021
I tried the below code
library(splitstackshape) df2=concat.split.multiple(df1,"Plant.Item.Code","/", direction="long")
but got the Error
"Error: memory exhausted (limit reached?)"
When i tried
strsplit()
i got the below error message.Error in strsplit(df1$Plant.Item.Code, "/") : non-character argument
Any help from you will be appreciated.
解决方案Try the
cSplit
function (as you already using @Anandas package). Note that is will return adata.table
object, so make sure you have this package installed. You can revert back todata.frame
(if you want to) by doing something likesetDF(df2)
library(splitstackshape) df2 <- cSplit(df1, "Item.Code", sep = "/", direction = "long") df2 # Country Region Molecule Item.Code # 1: IND NA PB102 FR206985511 # 2: THAI AP PB103 BA-107603 # 3: THAI AP PB103 F000113361 # 4: THAI AP PB103 107603 # 5: LUXE NA PB105 1012701 # 6: LUXE NA PB105 SGP-1012701 # 7: LUXE NA PB105 F041701000 # 8: IND AP PB106 AU206985211 # 9: IND AP PB106 CA-F206985211 # 10: THAI HP PB107 F034702000 # 11: THAI HP PB107 1010701 # 12: THAI HP PB107 SGP-1010701 # 13: BANG NA PB108 F000007970 # 14: BANG NA PB108 25781 # 15: BANG NA PB108 20009021
这篇关于将字符串拆分为R中的新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!