R分号将一列分隔成几行 [英] R semicolon delimited a column into rows
问题描述
我正在使用RStudio 2.15.0,并且已经使用XLConnect从Excel创建了一个对象,该对象具有3000多个行和12列,我试图将一列分隔/拆分为行,但是不知道这是否可行或如何做吧。下面的数据示例使用3列进行连接。任何帮助都会很大。
I am using RStudio 2.15.0 and have created an object from Excel using XLConnect with 3000+ rows and 12 columns I am trying to delimit/split a column into the rows but don't know if this is possible or how to do it. Example of the data below using the 3 columns in connection. any help on this would be grand.
适用于以下两列的代码如下。
Code that is working for 2 of the columns is below.
v1 <- with(df, tapply(PolId, Description, FUN= function(x) {
x1 <- paste(x, collapse=";")
gsub('(\\b\\S+\\b)(?=.*\\b\\1\\b.*);', '', x1, perl=TRUE)}))
library(stringr)
Description <- rep(names(v1), str_count(v1, '\\w+'))
PolId <- scan(text=gsub(';+', ' ', v1), what='', quiet=TRUE)
data.frame(PolId, Description)
样本数据
PolId Description Document.Type
ABC123;ABC456;ABC789; TEST1 Pol1
ABC123;ABC456;ABC789; TEST1 Pol1
ABC123;ABC456;ABC789; TEST1 Pol1
AAA123; TEST1 End1
AAA123; TEST2 End2
ABB123;ABC123; TEST3 End1
ABB123;ABC123; TEST3 End1
我希望输出是这样的(替换重复的Polid)
I want the output to be like this (replacing the duplicate Polid's)
PolId Description Document.Type
ABC123 TEST1 Pol1
ABC456 TEST1 Pol1
ABC789 TEST1 Pol1
AAA123 TEST1 End1
AAA123 TEST2 End2
ABB123 TEST3 End1
ABC123 TEST3 End1
推荐答案
在拆分 PolId后,您可以尝试从 tidyr
中的 unstest
列并获取唯一行
行
You could try unnest
from tidyr
after splitting the "PolId" column and get the unique
rows
library(dplyr)
library(tidyr)
unnest(setNames(strsplit(df$PolId, ';'), df$Description),
Description) %>% unique()
或将 base R
与 stack / strsplit / duplicated一起使用
。用定界符(;
)拆分 PolId( strsplit
),并用 Description命名输出列表元素列,在 stack
列表中获取 data.frame,并使用 duplicated
删除重复的行。
Or using base R
with stack/strsplit/duplicated
. Split the "PolId" (strsplit
) by the delimiter(;
), name the output list elements with "Description" column, stack
the list to get a 'data.frame' and use duplicated
to remove the duplicate rows.
df1 <- stack(setNames(strsplit(df$PolId, ';'), df$Description))
setNames(df1[!duplicated(df1),], names(df))
# PolId Description
#1 ABC123 TEST1
#2 ABC456 TEST1
#3 ABC789 TEST1
#10 AAA123 TEST1
#11 AAA123 TEST2
#12 ABB123 TEST3
#13 ABC123 TEST3
或不使用 strsplit
v1 <- with(df, tapply(PolId, Description, FUN= function(x) {
x1 <- paste(x, collapse=";")
gsub('(\\b\\S+\\b)(?=.*\\b\\1\\b.*);', '', x1, perl=TRUE)}))
library(stringr)
Description <- rep(names(v1), str_count(v1, '\\w+'))
PolId <- scan(text=gsub(';+', ' ', v1), what='', quiet=TRUE)
data.frame(PolId, Description)
# PolId Description
#1 ABC123 TEST1
#2 ABC456 TEST1
#3 ABC789 TEST1
#4 AAA123 TEST1
#5 AAA123 TEST2
#6 ABB123 TEST3
#7 ABC123 TEST3
这篇关于R分号将一列分隔成几行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!