合并具有重复数据的数据 [英] Merge data.frames with duplicates

查看:190
本文介绍了合并具有重复数据的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多data.frames,例如:

I have many data.frames, for example:

df1 = data.frame(names=c('a','b','c','c','d'),data1=c(1,2,3,4,5))
df2 = data.frame(names=c('a','e','e','c','c','d'),data2=c(1,2,3,4,5,6))
df3 = data.frame(names=c('c','e'),data3=c(1,2))

我需要合并这些data.frames,而不删除名称重复。

and I need to merge these data.frames, without delete the name duplicates

> result
  names data1 data2 data3
1  'a'    1    1      NA
2  'b'    2    NA     NA
3  'c'    3    4      1
4  'c'    4    5      NA
5  'd'    5    6      NA
6  'e'    NA   2      2       
7  'e'    NA   3      NA

我无法找到与选项合并的功能来处理名称重复。感谢您的帮助。
定义我的问题。数据来自生物实验,其中一个样品具有不同数量的重复。我需要合并所有的实验,我需要生成这个表。我不能为复制生成唯一的标识符。

I cant find function like merge with option to handle with name duplicates. Thank you for your help. To define my problem. The data comes from biological experiment where one sample have a different number of replicates. I need to merge all experiment, and I need to produce this table. I can't generate unique identifier for replicates.

推荐答案

首先定义一个函数,$ code> run.seq 重复的数字,因为它从输出中出现,所希望的是合并的每个组件中每个名称的第i个副本相关联。然后创建数据框的列表,并向每个组件添加一个 run.seq 列。最后使用减少来合并它们。

First define a function, run.seq, which provides sequence numbers for duplicates since it appears from the output that what is desired is that the ith duplicate of each name in each component of the merge be associated. Then create a list of the data frames and add a run.seq column to each component. Finally use Reduce to merge them all.

run.seq <- function(x) as.numeric(ave(paste(x), x, FUN = seq_along))

L <- list(df1, df2, df3)
L2 <- lapply(L, function(x) cbind(x, run.seq = run.seq(x$names)))

out <- Reduce(function(...) merge(..., all = TRUE), L2)[-2]

最后一行给出: p>

The last line gives:

> out
  names data1 data2 data3
1     a     1     1    NA
2     b     2    NA    NA
3     c     3     4     1
4     c     4     5    NA
5     d     5     6    NA
6     e    NA     2     2
7     e    NA     3    NA

编辑:修改 run.seq ,以便输入不需要排序。

Revised run.seq so that input need not be sorted.

这篇关于合并具有重复数据的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆