合并具有重复项的 data.frames [英] Merge data.frames with duplicates

查看:25
本文介绍了合并具有重复项的 data.frames的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多data.frames,例如:

I have many data.frames, for example:

df1 = data.frame(names=c('a','b','c','c','d'),data1=c(1,2,3,4,5))
df2 = data.frame(names=c('a','e','e','c','c','d'),data2=c(1,2,3,4,5,6))
df3 = data.frame(names=c('c','e'),data3=c(1,2))

并且我需要合并这些data.frames,而不删除重复的名称

and I need to merge these data.frames, without delete the name duplicates

> result
  names data1 data2 data3
1  'a'    1    1      NA
2  'b'    2    NA     NA
3  'c'    3    4      1
4  'c'    4    5      NA
5  'd'    5    6      NA
6  'e'    NA   2      2       
7  'e'    NA   3      NA

我找不到像合并选项这样的功能来处理名称重复.谢谢您的帮助.来定义我的问题.数据来自生物实验,其中一个样本具有不同数量的重复.我需要合并所有实验,我需要生成这张表.我无法为复制品生成唯一标识符.

I cant find function like merge with option to handle with name duplicates. Thank you for your help. To define my problem. The data comes from biological experiment where one sample have a different number of replicates. I need to merge all experiment, and I need to produce this table. I can't generate unique identifier for replicates.

推荐答案

首先定义一个函数,run.seq,它为重复项提供序列号,因为它从输出中显示出所需的是将合并的每个组件中每个名称的第 i 个副本关联起来.然后创建一个数据框列表并向每个组件添加一个 run.seq 列.最后使用 Reduce 将它们全部合并.

First define a function, run.seq, which provides sequence numbers for duplicates since it appears from the output that what is desired is that the ith duplicate of each name in each component of the merge be associated. Then create a list of the data frames and add a run.seq column to each component. Finally use Reduce to merge them all.

run.seq <- function(x) as.numeric(ave(paste(x), x, FUN = seq_along))

L <- list(df1, df2, df3)
L2 <- lapply(L, function(x) cbind(x, run.seq = run.seq(x$names)))

out <- Reduce(function(...) merge(..., all = TRUE), L2)[-2]

最后一行给出:

> out
  names data1 data2 data3
1     a     1     1    NA
2     b     2    NA    NA
3     c     3     4     1
4     c     4     5    NA
5     d     5     6    NA
6     e    NA     2     2
7     e    NA     3    NA

修改了 run.seq 以便输入不需要排序.

Revised run.seq so that input need not be sorted.

这篇关于合并具有重复项的 data.frames的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆