R - 消除重复值 [英] R - eliminating duplicate values
问题描述
我有一个这样的输入数据框:
I have an input dataframe like this:
我希望输出如下:
例如,我想要占用第一个值(玛丽有生命),扫描它与所有其他行有重复的COL1条目,如果一个重复的COL2值存在我需要消除重复单独合并非重复。换句话说,我想做模式搜索。如果在另一行中存在相同的模式,我只想消除重复的模式并合并非重复的模式。
For example, I want to take up the first value(mary has life), scan it against all other rows which have duplicate COL1 entries and if a duplicate COL2 value is present I need to eliminate duplicates alone while merging non-duplicates. In other words, I want to do pattern search. If the same pattern is present in another row, I just want to eliminate duplicate patterns and merge non-duplicate patterns.
我尝试使用grepl和gsub函数,但我
I tried using the grepl and gsub functions but I am not able to get my desired result properly.
在下面插入更简单的输入数据集版本:
COL1 COL2
10玛丽有生命
10唐玛丽有生命
10 Britto玛丽有生命
20推他们
20推他们毛皮
30在这个
30这是在这个
40年
40狗狗
40马
COL1 COL2 10 mary has life 10 Don mary has life 10 Britto mary has life 20 push them 20 push them fur 30 yell at this 30 this is yell at this 40 Year 40 Doggy 40 Horse
推荐答案
更新后:
df <- read.table(
text = "COL1; COL2
10; mary has life
10; Don mary has life
10; Britto mary has life
20; push them
20; push them fur
30; yell at this
30; this is yell at this",
sep = ";", header = TRUE,
strip.white = TRUE, stringsAsFactors = FALSE)
library(dplyr)
res <- df %>%
group_by(COL1) %>%
do(COL2 = {
first_value <- .$COL2[[1]]
paste(unlist(Reduce(function(a, b) {
new_values <- strsplit(b, first_value)[[1]]
c(a, new_values)
}, .$COL2)), collapse = ", ")
})
res$COL2 <- unlist(res$COL2)
这篇关于R - 消除重复值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!