R 分号将一列分隔成行 [英] R semicolon delimited a column into rows
问题描述
我正在使用 RStudio 2.15.0 并使用 XLConnect 从 Excel 创建了一个对象,其中包含 3000 多行和 12 列我试图将一列分隔/拆分为行,但不知道这是否可行或如何做吧.下面使用 3 列连接的数据示例.对此的任何帮助将是巨大的.
I am using RStudio 2.15.0 and have created an object from Excel using XLConnect with 3000+ rows and 12 columns I am trying to delimit/split a column into the rows but don't know if this is possible or how to do it. Example of the data below using the 3 columns in connection. any help on this would be grand.
适用于其中 2 列的代码如下.
Code that is working for 2 of the columns is below.
v1 <- with(df, tapply(PolId, Description, FUN= function(x) {
x1 <- paste(x, collapse=";")
gsub('(\b\S+\b)(?=.*\b\1\b.*);', '', x1, perl=TRUE)}))
library(stringr)
Description <- rep(names(v1), str_count(v1, '\w+'))
PolId <- scan(text=gsub(';+', ' ', v1), what='', quiet=TRUE)
data.frame(PolId, Description)
样本数据
PolId Description Document.Type
ABC123;ABC456;ABC789; TEST1 Pol1
ABC123;ABC456;ABC789; TEST1 Pol1
ABC123;ABC456;ABC789; TEST1 Pol1
AAA123; TEST1 End1
AAA123; TEST2 End2
ABB123;ABC123; TEST3 End1
ABB123;ABC123; TEST3 End1
我希望输出是这样的(替换重复的 Polid)
I want the output to be like this (replacing the duplicate Polid's)
PolId Description Document.Type
ABC123 TEST1 Pol1
ABC456 TEST1 Pol1
ABC789 TEST1 Pol1
AAA123 TEST1 End1
AAA123 TEST2 End2
ABB123 TEST3 End1
ABC123 TEST3 End1
推荐答案
您可以在拆分PolId"列后尝试 unnest
from tidyr
并获得 唯一
行
You could try unnest
from tidyr
after splitting the "PolId" column and get the unique
rows
library(dplyr)
library(tidyr)
unnest(setNames(strsplit(df$PolId, ';'), df$Description),
Description) %>% unique()
或者使用 base R
和 stack/strsplit/duplicated
.用分隔符(;
)分割PolId"(strsplit
),用Description"列命名输出列表元素,stack
列表获取data.frame"并使用 duplicated
删除重复的行.
Or using base R
with stack/strsplit/duplicated
. Split the "PolId" (strsplit
) by the delimiter(;
), name the output list elements with "Description" column, stack
the list to get a 'data.frame' and use duplicated
to remove the duplicate rows.
df1 <- stack(setNames(strsplit(df$PolId, ';'), df$Description))
setNames(df1[!duplicated(df1),], names(df))
# PolId Description
#1 ABC123 TEST1
#2 ABC456 TEST1
#3 ABC789 TEST1
#10 AAA123 TEST1
#11 AAA123 TEST2
#12 ABB123 TEST3
#13 ABC123 TEST3
或者不使用 strsplit
v1 <- with(df, tapply(PolId, Description, FUN= function(x) {
x1 <- paste(x, collapse=";")
gsub('(\b\S+\b)(?=.*\b\1\b.*);', '', x1, perl=TRUE)}))
library(stringr)
Description <- rep(names(v1), str_count(v1, '\w+'))
PolId <- scan(text=gsub(';+', ' ', v1), what='', quiet=TRUE)
data.frame(PolId, Description)
# PolId Description
#1 ABC123 TEST1
#2 ABC456 TEST1
#3 ABC789 TEST1
#4 AAA123 TEST1
#5 AAA123 TEST2
#6 ABB123 TEST3
#7 ABC123 TEST3
这篇关于R 分号将一列分隔成行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!