在R中合并两个数据帧时聚合 [英] Aggregating while merging two dataframes in R
问题描述
最终目标是将 product_info
中的每个记录的总数量( transact_data $ qty
)加起来,其中 transact_data $ productId
存在于 product_info
中,其中 transact_data $ date
在 product_info $ beg_date
和 product_info $ end_date
之间。
<数据框如下:
product_info< - data.frame(productId = c(A,B ,A,C,C,B),
old_price = c(0.5,0.10,0.11,0.12,0.3,0.4),
new_price = c(0.7,0.11 ,0.12,0.11,0.2,0.3),
beg_date = c(2014-05-01,2014-06-01,2014-05-01,2014-06-01 2014-05-01,2014-06-01),
end_date = c(2014-05-31,2014-06-31,2014-05-31, 2014-06-31,2014-05-31,2014-06-31),stringsAsFactors = FALSE)
transact_data< - data.frame(productId = c('A ','B','A','C','A','B' ,'C','B','A','C','A','B'),
date = c(2014-05-05,2014-06-22 2014-07-05,2014-08-31,2014-05-03,2014-02-22,
2014-05-21,2014-06-19 ,2014-03-09,2014-06-22,2014-04-03,2014-07-08),
qty = c(12,15,5,21 ,13,17,2,5,11,9,6,4),stringsAsFactors = FALSE)
我的第一步是通过productId合并两个数据框:
sku_transact_merge< -merge(x = product_info,y = transact_data, by = c(productId))
下一步是计算数量和: p>
sku_transact_merge $ total_qty< - ifelse(sku_transact_merge $ date> = sku_transact_merge $ beg_date&
sku_transact_merge $ date< = sku_transact_merge $ end_date,
aggregate(qty〜productId + beg_date + end_date,
data = sku_transact_merge,sum),0)
结果不是我想要的,我收到一个错误,说
< blockquote>
(list)对象不能强制键入'double'
任何指针如何正确执行这个逻辑将非常感激!
这可能是另一种使用 dplyr )
(如果您的数据集很大,这应该是有效的)
df = subset(sku_transact_merge,date> beg_date& date< end_date)
df = subset(df,select = -c(date))
out = unique(df%>%group_by(productId,old_price)%>%mutate (qty = sum(qty)))
#> out
#Source:本地数据框架[6 x 6]
#Groups:productId,old_price
#productId old_price new_price beg_date end_date qty
#1 A 0.50 0.70 2014-05-01 2014-05-31 25
#2 A 0.11 0.12 2014-05-01 2014-05-31 25
#3 B 0.10 0.11 2014-06-01 2014-06-31 20
#4 B 0.40 0.30 2014-06-01 2014-06-31 20
#5 C 0.12 0.11 2014-06-01 2014-06-31 9
#6 C 0.30 0.20 2014-05-01 2014-05-31 2
否则你可以使用 data.table
库(data.table)
out = setDT(df )[,list(qtynew = sum(qty)),by = list(productId,old_price)]
#> out
#productId old_price qtynew
#1:A 0.50 25
#2:A 0.11 25
#3:B 0.10 20
#4:B 0.40 20
#5:C 0.12 9
#6:C 0.30 2
The ultimate goal is to sum the total quantity(transact_data$qty
) for each record in product_info
where the transact_data$productId
exists in product_info
, and where transact_data$date
is between product_info$beg_date
and product_info$end_date
.
The dataframes are below:
product_info <- data.frame(productId = c("A", "B", "A", "C","C","B"),
old_price = c(0.5,0.10,0.11,0.12,0.3,0.4),
new_price = c(0.7,0.11,0.12,0.11,0.2,0.3),
beg_date = c("2014-05-01", "2014-06-01", "2014-05-01", "2014-06-01","2014-05-01", "2014-06-01"),
end_date = c("2014-05-31", "2014-06-31", "2014-05-31", "2014-06-31","2014-05-31", "2014-06-31"), stringsAsFactors=FALSE)
transact_data <- data.frame(productId=c('A', 'B','A', 'C','A', 'B','C', 'B','A', 'C','A', 'B'),
date=c("2014-05-05", "2014-06-22", "2014-07-05", "2014-08-31","2014-05-03", "2014-02-22",
"2014-05-21", "2014-06-19", "2014-03-09", "2014-06-22","2014-04-03", "2014-07-08"),
qty =c(12,15,5,21,13,17,2,5,11,9,6,4), stringsAsFactors=FALSE)
My first step was to merge both dataframes by productId:
sku_transact_merge <-merge(x=product_info, y=transact_data, by = c("productId"))
The next step was to calculate the quantity sum:
sku_transact_merge$total_qty <- ifelse(sku_transact_merge$date >= sku_transact_merge$beg_date &
sku_transact_merge$date <= sku_transact_merge$end_date,
aggregate(qty ~ productId+beg_date+end_date,
data= sku_transact_merge, sum), 0)
The result is not what I desire, and i'm getting an error that says
(list) object cannot be coerced to type 'double'
Any pointers on how to properly execute this logic would be much appreciated!
This could be another way to do this using dplyr()
(This should be effective if your data set is huge)
library(dplyr)
df = subset(sku_transact_merge, date > beg_date & date < end_date)
df = subset(df, select= -c(date))
out = unique(df %>% group_by(productId,old_price) %>% mutate(qty = sum(qty)))
#> out
#Source: local data frame [6 x 6]
#Groups: productId, old_price
#productId old_price new_price beg_date end_date qty
#1 A 0.50 0.70 2014-05-01 2014-05-31 25
#2 A 0.11 0.12 2014-05-01 2014-05-31 25
#3 B 0.10 0.11 2014-06-01 2014-06-31 20
#4 B 0.40 0.30 2014-06-01 2014-06-31 20
#5 C 0.12 0.11 2014-06-01 2014-06-31 9
#6 C 0.30 0.20 2014-05-01 2014-05-31 2
or else you could use data.table
library(data.table)
out = setDT(df)[, list(qtynew = sum(qty)), by = list(productId, old_price)]
#> out
# productId old_price qtynew
#1: A 0.50 25
#2: A 0.11 25
#3: B 0.10 20
#4: B 0.40 20
#5: C 0.12 9
#6: C 0.30 2
这篇关于在R中合并两个数据帧时聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!