合并数据帧而不复制行 [英] Merging data frames without duplicating rows

查看:70
本文介绍了合并数据帧而不复制行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想合并两个数据帧,但是如果有多个匹配项,则不想重复行。相反,我想总结当天的观察结果。

I would like to merge two data frames, but do not want to duplicate rows if there is more than one match. Instead I would like to sum the observations on that day.


来自?merge:提取与指定列匹配的数据帧,并将其连接在一起。 如果有多个匹配项,则所有可能的匹配项各占一行。

From ?merge: The rows in the two data frames that match on the specified columns are extracted, and joined together. If there is more than one match, all possible matches contribute one row each.

下面是一些示例代码:

days <- as.data.frame(as.Date(c("2012-1-1", "2012-1-2", "2012-1-3", "2012-1-4")))

names(days) <- "Date"
obs.days <- as.data.frame(as.Date(c("2012-1-2", "2012-1-3", "2012-1-3")))
obs.days$count <- 1
colnames(obs.days) <- c("Date", "Count")
df <- merge(days, obs.days, by.x="Date", by.y="Date", all.x=TRUE)

我想要最终数据框架只列出一次2012-1-3,计数值为2。

I would like the final data frame to only list 2012-1-3 one time with a count value of 2.

推荐答案

我建议您合并它们然后汇总它们(基本上是对每个唯一的 Date 执行SUM)。

I'd suggest you merge them and then aggregate them (essentially perform a SUM for each unique Date).

df <- merge(z.days,obs.days, by.x="Date", by.y="Date", all.x=TRUE)
        Date Count
1 2012-01-01    NA
2 2012-01-02     1
3 2012-01-03     1
4 2012-01-03     1
5 2012-01-04    NA

现在要进行合并,您可以使用 aggregate

Now to do the merge you could use aggregate:

df2 <- aggregate(df$Count,list(df$Date),sum)
     Group.1  x
1 2012-01-01 NA
2 2012-01-02  1
3 2012-01-03  2
4 2012-01-04 NA
names(df2)<-names(df)

但是我推荐软件包 plyr ,太棒了!特别是函数 ddply

BUT I'd recommend package plyr, which is awesome! In particular, function ddply.

library(plyr)
ddply(df,.(Date),function(x) data.frame(Date=x$Date[1],Count=sum(x$Count)))
        Date Count
1 2012-01-01    NA
2 2012-01-02     1
3 2012-01-03     2
4 2012-01-04    NA

命令 ddply(df,。(Date),FUN)本质上是:

for each date in unique(df$Date):
    add to output dataframe FUN( df[df$Date==date,] )

因此,我提供的函数创建了一个数据行,其中一列的列为日期计数,即该日期所有计数的总和。

So the function I've provided creates a data frame of one row with columns Date and Count, being the sum of all counts for that date.

这篇关于合并数据帧而不复制行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆