R分号将一列分隔成几行 [英] R semicolon delimited a column into rows

查看:158
本文介绍了R分号将一列分隔成几行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用RStudio 2.15.0,并且已经使用XLConnect从Excel创建了一个对象,该对象具有3000多个行和12列,我试图将一列分隔/拆分为行,但是不知道这是否可行或如何做吧。下面的数据示例使用3列进行连接。任何帮助都会很大。

I am using RStudio 2.15.0 and have created an object from Excel using XLConnect with 3000+ rows and 12 columns I am trying to delimit/split a column into the rows but don't know if this is possible or how to do it. Example of the data below using the 3 columns in connection. any help on this would be grand.

适用于以下两列的代码如下。

Code that is working for 2 of the columns is below.

v1 <- with(df, tapply(PolId, Description,  FUN= function(x) {
x1 <- paste(x, collapse=";")
gsub('(\\b\\S+\\b)(?=.*\\b\\1\\b.*);', '',     x1, perl=TRUE)}))
library(stringr)
Description <- rep(names(v1),  str_count(v1, '\\w+'))
PolId <- scan(text=gsub(';+', ' ', v1), what='', quiet=TRUE)
data.frame(PolId, Description)  

样本数据

PolId   Description  Document.Type
ABC123;ABC456;ABC789;   TEST1  Pol1
ABC123;ABC456;ABC789;   TEST1  Pol1
ABC123;ABC456;ABC789;   TEST1  Pol1
AAA123; TEST1  End1
AAA123; TEST2  End2
ABB123;ABC123;  TEST3  End1
ABB123;ABC123;  TEST3  End1

我希望输出是这样的(替换重复的Polid)

I want the output to be like this (replacing the duplicate Polid's)

PolId   Description  Document.Type
ABC123  TEST1        Pol1
ABC456  TEST1        Pol1
ABC789  TEST1        Pol1
AAA123  TEST1        End1
AAA123  TEST2        End2
ABB123  TEST3        End1
ABC123  TEST3        End1


推荐答案

在拆分 PolId后,您可以尝试从 tidyr 中的 unstest 列并获取唯一行

You could try unnest from tidyr after splitting the "PolId" column and get the unique rows

library(dplyr)
library(tidyr)
 unnest(setNames(strsplit(df$PolId, ';'), df$Description), 
                                  Description) %>% unique()

或将 base R stack / strsplit / duplicated一起使用。用定界符(; )拆分 PolId( strsplit ),并用 Description命名输出列表元素列,在 stack 列表中获取 data.frame,并使用 duplicated 删除重复的行。

Or using base R with stack/strsplit/duplicated. Split the "PolId" (strsplit) by the delimiter(;), name the output list elements with "Description" column, stack the list to get a 'data.frame' and use duplicated to remove the duplicate rows.

df1 <- stack(setNames(strsplit(df$PolId, ';'), df$Description))
setNames(df1[!duplicated(df1),], names(df))
#     PolId Description
#1  ABC123       TEST1
#2  ABC456       TEST1
#3  ABC789       TEST1
#10 AAA123       TEST1
#11 AAA123       TEST2
#12 ABB123       TEST3
#13 ABC123       TEST3

或不使用 strsplit

v1 <- with(df, tapply(PolId, Description, FUN= function(x) {
            x1 <- paste(x, collapse=";")
        gsub('(\\b\\S+\\b)(?=.*\\b\\1\\b.*);', '', x1, perl=TRUE)}))
library(stringr)
Description <- rep(names(v1),  str_count(v1, '\\w+'))
PolId <- scan(text=gsub(';+', ' ', v1), what='', quiet=TRUE)
data.frame(PolId, Description)
#   PolId Description
#1 ABC123       TEST1
#2 ABC456       TEST1
#3 ABC789       TEST1
#4 AAA123       TEST1
#5 AAA123       TEST2
#6 ABB123       TEST3
#7 ABC123       TEST3

这篇关于R分号将一列分隔成几行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆