在R中合并两个数据帧时聚合 [英] Aggregating while merging two dataframes in R

查看:170
本文介绍了在R中合并两个数据帧时聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最终目标是将 product_info 中的每个记录的总数量( transact_data $ qty )加起来,其中 transact_data $ productId 存在于 product_info 中,其中 transact_data $ date product_info $ beg_date product_info $ end_date 之间。



<数据框如下:

  product_info<  -  data.frame(productId = c(A,B ,A,C,C,B),
old_price = c(0.5,0.10,0.11,0.12,0.3,0.4),
new_price = c(0.7,0.11 ,0.12,0.11,0.2,0.3),
beg_date = c(2014-05-01,2014-06-01,2014-05-01,2014-06-01 2014-05-01,2014-06-01),
end_date = c(2014-05-31,2014-06-31,2014-05-31, 2014-06-31,2014-05-31,2014-06-31),stringsAsFactors = FALSE)

transact_data< - data.frame(productId = c('A ','B','A','C','A','B' ,'C','B','A','C','A','B'),
date = c(2014-05-05,2014-06-22 2014-07-05,2014-08-31,2014-05-03,2014-02-22,
2014-05-21,2014-06-19 ,2014-03-09,2014-06-22,2014-04-03,2014-07-08),
qty = c(12,15,5,21 ,13,17,2,5,11,9,6,4),stringsAsFactors = FALSE)

我的第一步是通过productId合并两个数据框:

  sku_transact_merge< -merge(x = product_info,y = transact_data, by = c(productId))

下一步是计算数量和: p>

  sku_transact_merge $ total_qty<  -  ifelse(sku_transact_merge $ date> = sku_transact_merge $ beg_date& 
sku_transact_merge $ date< = sku_transact_merge $ end_date,
aggregate(qty〜productId + beg_date + end_date,
data = sku_transact_merge,sum),0)

结果不是我想要的,我收到一个错误,说



< blockquote>

(list)对象不能强制键入'double'


任何指针如何正确执行这个逻辑将非常感激!

解决方案

这可能是另一种使用 dplyr ) (如果您的数据集很大,这应该是有效的)

  df = subset(sku_transact_merge,date> beg_date& date< end_date)
df = subset(df,select = -c(date))
out = unique(df%>%group_by(productId,old_price)%>%mutate (qty = sum(qty)))

#> out
#Source:本地数据框架[6 x 6]
#Groups:productId,old_price

#productId old_price new_price beg_date end_date qty
#1 A 0.50 0.70 2014-05-01 2014-05-31 25
#2 A 0.11 0.12 2014-05-01 2014-05-31 25
#3 B 0.10 0.11 2014-06-01 2014-06-31 20
#4 B 0.40 0.30 2014-06-01 2014-06-31 20
#5 C 0.12 0.11 2014-06-01 2014-06-31 9
#6 C 0.30 0.20 2014-05-01 2014-05-31 2

否则你可以使用 data.table

 库(data.table)
out = setDT(df )[,list(qtynew = sum(qty)),by = list(productId,old_price)]

#> out
#productId old_price qtynew
#1:A 0.50 25
#2:A 0.11 25
#3:B 0.10 20
#4:B 0.40 20
#5:C 0.12 9
#6:C 0.30 2


The ultimate goal is to sum the total quantity(transact_data$qty) for each record in product_info where the transact_data$productId exists in product_info, and where transact_data$date is between product_info$beg_date and product_info$end_date.

The dataframes are below:

product_info <- data.frame(productId = c("A", "B", "A", "C","C","B"), 
                      old_price = c(0.5,0.10,0.11,0.12,0.3,0.4),
                      new_price = c(0.7,0.11,0.12,0.11,0.2,0.3),
                      beg_date = c("2014-05-01", "2014-06-01", "2014-05-01", "2014-06-01","2014-05-01", "2014-06-01"),
                      end_date = c("2014-05-31", "2014-06-31", "2014-05-31", "2014-06-31","2014-05-31", "2014-06-31"), stringsAsFactors=FALSE)

transact_data <- data.frame(productId=c('A', 'B','A', 'C','A', 'B','C', 'B','A', 'C','A', 'B'),
                  date=c("2014-05-05", "2014-06-22", "2014-07-05", "2014-08-31","2014-05-03", "2014-02-22",
                    "2014-05-21", "2014-06-19", "2014-03-09", "2014-06-22","2014-04-03", "2014-07-08"),
                    qty =c(12,15,5,21,13,17,2,5,11,9,6,4), stringsAsFactors=FALSE)

My first step was to merge both dataframes by productId:

sku_transact_merge <-merge(x=product_info, y=transact_data, by = c("productId"))

The next step was to calculate the quantity sum:

sku_transact_merge$total_qty <- ifelse(sku_transact_merge$date >= sku_transact_merge$beg_date & 
                                       sku_transact_merge$date <= sku_transact_merge$end_date, 
                                     aggregate(qty ~ productId+beg_date+end_date,
                                               data= sku_transact_merge, sum), 0)

The result is not what I desire, and i'm getting an error that says

(list) object cannot be coerced to type 'double'

Any pointers on how to properly execute this logic would be much appreciated!

解决方案

This could be another way to do this using dplyr() (This should be effective if your data set is huge)

library(dplyr)
df = subset(sku_transact_merge, date > beg_date & date < end_date)
df = subset(df, select= -c(date))
out = unique(df %>% group_by(productId,old_price) %>% mutate(qty = sum(qty)))

#> out
#Source: local data frame [6 x 6]
#Groups: productId, old_price

#productId old_price new_price   beg_date   end_date qty
#1         A      0.50      0.70 2014-05-01 2014-05-31  25
#2         A      0.11      0.12 2014-05-01 2014-05-31  25
#3         B      0.10      0.11 2014-06-01 2014-06-31  20
#4         B      0.40      0.30 2014-06-01 2014-06-31  20
#5         C      0.12      0.11 2014-06-01 2014-06-31   9
#6         C      0.30      0.20 2014-05-01 2014-05-31   2

or else you could use data.table

library(data.table)
out = setDT(df)[, list(qtynew = sum(qty)), by = list(productId, old_price)]

#> out
#   productId old_price qtynew
#1:         A      0.50     25
#2:         A      0.11     25
#3:         B      0.10     20
#4:         B      0.40     20
#5:         C      0.12      9
#6:         C      0.30      2

这篇关于在R中合并两个数据帧时聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆