如何在R中同时合并多个data.frames和求和平均列 [英] How to merge multiple data.frames and sum and average columns at the same time in R

查看:156
本文介绍了如何在R中同时合并多个data.frames和求和平均列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有20多个data.frame,它们的列相同,但行数却不同.我的目标是通过名称"列(这是五个名称的列表)合并data.frames,并且在合并时,我希望具有相同名称的行对A列求和,对B列求和,并得到C列.

I have over 20 twenty data.frames with the same columns but differing amount of rows. My goal is to merge the data.frames by the column "Name" (which is a list of five names) and while merging I would like the rows with the same name to sum column A, sum column B, and get the mean of column C.

这是我当前正在做的事情.

Here is what I am currently doing.

首先,我将一次合并2个data.frame.

First I will just merge 2 data.frames at a time.

DF <- merge(x=abc, y=def, by = "Name", all = T)

合并的DF看起来像这样

Merged DF will look like such

Name                              A.x    B.x   C.x   A.y   B.y  C.y
name1,name2,name3,name4,name5      11     24     7    NA    NA   NA
name1,name3,name4,name6,name7       4      8    12     3     4    7
name1,name2,name5,name6,name7      12      4     5    NA    NA   NA
name3,name4,name5,name6,name7      NA     NA    NA    15     3   28

然后我将添加这些ifelse语句来处理NAs和非唯一行.对于非唯一行,它将为A加B,为C加一个平均值.

I will then add these ifelse statements to deal with the NAs and non unique rows. For the non unique rows it will add for A add for B and for C it will get an average.

DF$A <- ifelse(is.na(DF$A.x), DF$A.y,
    ifelse(is.na(DF$A.y), DF$A.x,
        ifelse((!is.na(DF$A.x)) & (!is.na(DF$A.y)), DF$A.x + DF$A.y, 1)))
DF$B <- ifelse(is.na(DF$B.x), DF$B.y, 
    ifelse(is.na(DF$B.y), DF$B.x,
        ifelse((!is.na(DF$B.x)) & (!is.na(DF$B.y)), DF$B.x + DF$B.y, 1)))
DF$C <- ifelse(is.na(DF$C.x), DF$C.y, 
    ifelse(is.na(DF$C.y), DF$C.x,
        ifelse((!is.na(DF$C.x)) & (!is.na(DF$C.y)), (DF$C.x  + DF$C.y)/2, 1)))

DF现在看起来像这样

DF will now look like such

Name                            A.x  B.x  C.x  A.y  B.y  C.y   A   B   C   
name1,name2,name3,name4,name5    11   24    7   NA   NA   NA  11  24   7
name1,name3,name4,name6,name7     4    8   12    3    4    8   7  12  10
name1,name2,name5,name6,name7    12    4    5   NA   NA   NA  12   4   5
name3,name4,name5,name6,name7    NA   NA   NA   15    3   28  15   3  28

然后我只保留名称"列和最后三列

I then keep just the Name column and the last three columns

merge1 <- DF[c(1,8,9,10)]

然后,我对接下来的两个data.frames执行相同的过程,并将其称为merge2.然后,我将合并merge1和merge 2.

Then I do the same process for the next two data.frames and call it merge2. Then I will merge merge1 and merge 2.

total1 <- merge(x = merge1, y = merge2, by = "Name", all = TRUE)

我将继续一次合并两个数据帧,然后将Totals data.frames一次合并两个.我得到了想要的最终结果,但这是一个及时的过程,效率不是很高.

I will just continue to merge two data frames at a time then merge the Totals data.frames together as well two at a time. I get my end result that I want but it is a timely process and not very efficient.

我认为我可以做到的另一种方法可能是对所有data.frames进行rbind,然后如果在名称"列中任一行具有与另一行相同的名称列表,则使该行,添加A列,添加B列并获得C列的均值.但是我也不知道该怎么做.

Another way I think I could do it is may be do a rbind with all the data.frames then if in the Name column any row has the same list of names as another row then make that one row, add column A, add column B and get the mean of column C. But I am not sure how to do that as well.

这是我想要外皮的一个例子

Here is an example of what I would like with rind

Name                              A     B     C
name1,name2,name3,name4,name5    11    24     7
name1,name3,name4,name6,name7     4     8    12
name1,name2,name5,name6,name7    12     4     5 
name3,name4,name5,name6,name7    15     3    28 
name1,name3,name4,name6,name7     3     4     8

最终结果看起来像这样

Name                              A     B     C
name1,name2,name3,name4,name5    11    24     7
name1,name3,name4,name6,name7     7    12    10
name1,name2,name5,name6,name7    12     4     5 
name3,name4,name5,name6,name7    15     3    28 

同样,我确信比我目前正在做的事情有更有效的方法来完成我想要的工作,因此将不胜感激.

Again, I am sure there are more efficient ways to complete what I want than what I am currently doing so any help would be greatly appreciated.

推荐答案

我认为您的第二种方法是可行的方法,您可以使用data.tabledplyr来做到这一点.

I think your second approach is the way to go, and you can do that with data.table or dplyr.

以下是使用data.table的几个步骤.首先,如果您的数据帧是abcdef,... 做:

Here a few steps using data.table. First, if your data frames are abc, def, ... do:

DF <- do.call(rbind, list(abc,def,...))

现在您可以将它们转换为data.table

now you can transform them into a data.table

DT <- data.table(DF)

只需执行类似的操作

DTres <- DT[,.(A=sum(A, na.rm=T), B=sum(B, na.rm=T), C=mean(C,na.rm=T)),by=name]

仔细检查data.table插图,以更好地了解该程序包的工作原理.

double check the data.table vignettes to get a better idea how that package work.

这篇关于如何在R中同时合并多个data.frames和求和平均列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆