R - 消除重复值 [英] R - eliminating duplicate values

查看:103
本文介绍了R - 消除重复值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的输入数据框:

I have an input dataframe like this:

我希望输出如下:

例如,我想要占用第一个值(玛丽有生命),扫描它与所有其他行有重复的COL1条目,如果一个重复的COL2值存在我需要消除重复单独合并非重复。换句话说,我想做模式搜索。如果在另一行中存在相同的模式,我只想消除重复的模式并合并非重复的模式。

For example, I want to take up the first value(mary has life), scan it against all other rows which have duplicate COL1 entries and if a duplicate COL2 value is present I need to eliminate duplicates alone while merging non-duplicates. In other words, I want to do pattern search. If the same pattern is present in another row, I just want to eliminate duplicate patterns and merge non-duplicate patterns.

我尝试使用grepl和gsub函数,但我

I tried using the grepl and gsub functions but I am not able to get my desired result properly.

在下面插入更简单的输入数据集版本:

COL1 COL2
10玛丽有生命
10唐玛丽有生命
10 Britto玛丽有生命
20推他们
20推他们毛皮
30在这个
30这是在这个
40年
40狗狗
40马

COL1 COL2 10 mary has life 10 Don mary has life 10 Britto mary has life 20 push them 20 push them fur 30 yell at this 30 this is yell at this 40 Year 40 Doggy 40 Horse

推荐答案

更新后:

df <- read.table(
  text = "COL1;    COL2
10;  mary has life
10;  Don mary has life
10;  Britto mary has life
20;  push them
20;  push them fur
30;  yell at this
30;  this is yell at this", 
  sep = ";", header = TRUE, 
  strip.white = TRUE, stringsAsFactors = FALSE)
library(dplyr)
res <- df %>%
  group_by(COL1) %>%
  do(COL2 = {
    first_value <- .$COL2[[1]]
    paste(unlist(Reduce(function(a, b) {
      new_values <- strsplit(b, first_value)[[1]]
      c(a, new_values)
    }, .$COL2)), collapse = ", ")
  })
res$COL2 <- unlist(res$COL2)

这篇关于R - 消除重复值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆