如何在R中同时合并多个data.frames和sum和average列 [英] How to merge multiple data.frames and sum and average columns at the same time in R

查看:67
本文介绍了如何在R中同时合并多个data.frames和sum和average列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有超过 20 20 个具有相同列但行数不同的 data.frame.我的目标是通过列名称"(这是五个名称的列表)合并 data.frames 并在合并时我希望具有相同名称的行对 A 列求和,B 列求和,并得到平均值C列.

I have over 20 twenty data.frames with the same columns but differing amount of rows. My goal is to merge the data.frames by the column "Name" (which is a list of five names) and while merging I would like the rows with the same name to sum column A, sum column B, and get the mean of column C.

这是我目前正在做的事情.

Here is what I am currently doing.

首先,我将一次合并 2 个 data.frames.

First I will just merge 2 data.frames at a time.

DF <- merge(x=abc, y=def, by = "Name", all = T)

合并后的 DF 看起来像这样

Merged DF will look like such

Name                              A.x    B.x   C.x   A.y   B.y  C.y
name1,name2,name3,name4,name5      11     24     7    NA    NA   NA
name1,name3,name4,name6,name7       4      8    12     3     4    7
name1,name2,name5,name6,name7      12      4     5    NA    NA   NA
name3,name4,name5,name6,name7      NA     NA    NA    15     3   28

然后我将添加这些 ifelse 语句来处理 NAs 和非唯一行.对于非唯一行,它将为 A 添加为 B 添加,对于 C 它将获得平均值.

I will then add these ifelse statements to deal with the NAs and non unique rows. For the non unique rows it will add for A add for B and for C it will get an average.

DF$A <- ifelse(is.na(DF$A.x), DF$A.y,
    ifelse(is.na(DF$A.y), DF$A.x,
        ifelse((!is.na(DF$A.x)) & (!is.na(DF$A.y)), DF$A.x + DF$A.y, 1)))
DF$B <- ifelse(is.na(DF$B.x), DF$B.y, 
    ifelse(is.na(DF$B.y), DF$B.x,
        ifelse((!is.na(DF$B.x)) & (!is.na(DF$B.y)), DF$B.x + DF$B.y, 1)))
DF$C <- ifelse(is.na(DF$C.x), DF$C.y, 
    ifelse(is.na(DF$C.y), DF$C.x,
        ifelse((!is.na(DF$C.x)) & (!is.na(DF$C.y)), (DF$C.x  + DF$C.y)/2, 1)))

DF 现在看起来像这样

DF will now look like such

Name                            A.x  B.x  C.x  A.y  B.y  C.y   A   B   C   
name1,name2,name3,name4,name5    11   24    7   NA   NA   NA  11  24   7
name1,name3,name4,name6,name7     4    8   12    3    4    8   7  12  10
name1,name2,name5,name6,name7    12    4    5   NA   NA   NA  12   4   5
name3,name4,name5,name6,name7    NA   NA   NA   15    3   28  15   3  28

然后我只保留名称列和最后三列

I then keep just the Name column and the last three columns

merge1 <- DF[c(1,8,9,10)]

然后我对接下来的两个 data.frames 执行相同的过程并将其称为 merge2.然后我将合并merge1和merge 2.

Then I do the same process for the next two data.frames and call it merge2. Then I will merge merge1 and merge 2.

total1 <- merge(x = merge1, y = merge2, by = "Name", all = TRUE)

我将继续一次合并两个数据帧,然后将 Totals data.frames 合并在一起,一次合并两个.我得到了我想要的最终结果,但这是一个及时的过程,而且效率不高.

I will just continue to merge two data frames at a time then merge the Totals data.frames together as well two at a time. I get my end result that I want but it is a timely process and not very efficient.

我认为我可以做的另一种方法是对所有的 data.frames 进行 rbind,然后如果在 Name 列中,任何行与另一行具有相同的名称列表,则创建该行,添加列 A,添加 B 列并获得 C 列的平均值.但我不知道如何做.

Another way I think I could do it is may be do a rbind with all the data.frames then if in the Name column any row has the same list of names as another row then make that one row, add column A, add column B and get the mean of column C. But I am not sure how to do that as well.

这是我想要的外皮示例

Name                              A     B     C
name1,name2,name3,name4,name5    11    24     7
name1,name3,name4,name6,name7     4     8    12
name1,name2,name5,name6,name7    12     4     5 
name3,name4,name5,name6,name7    15     3    28 
name1,name3,name4,name6,name7     3     4     8

最终结果应该是这样的

Name                              A     B     C
name1,name2,name3,name4,name5    11    24     7
name1,name3,name4,name6,name7     7    12    10
name1,name2,name5,name6,name7    12     4     5 
name3,name4,name5,name6,name7    15     3    28 

同样,我确信有比我目前正在做的更有效的方法来完成我想要的东西,因此我们将不胜感激.

Again, I am sure there are more efficient ways to complete what I want than what I am currently doing so any help would be greatly appreciated.

推荐答案

我认为您的第二种方法是可行的方法,您可以使用 data.tabledplyr.

I think your second approach is the way to go, and you can do that with data.table or dplyr.

这里有几个使用 data.table 的步骤.首先,如果您的数据框是 abcdef、...做:

Here a few steps using data.table. First, if your data frames are abc, def, ... do:

DF <- do.call(rbind, list(abc,def,...))

现在您可以将它们转换为 data.table

now you can transform them into a data.table

DT <- data.table(DF)

然后简单地做一些类似的事情

and simply do something like

DTres <- DT[,.(A=sum(A, na.rm=T), B=sum(B, na.rm=T), C=mean(C,na.rm=T)),by=name]

仔细检查 data.table 小插图以更好地了解该包的工作原理.

double check the data.table vignettes to get a better idea how that package work.

这篇关于如何在R中同时合并多个data.frames和sum和average列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆