按 R 中的不同列值求和 [英] Sum by distinct column value in R
问题描述
我在 R 中有一个非常大的数据框,并且想将其他列中每个不同值的两列相加,例如,假设我们在一天内有不同商店的交易数据框的数据,如下所示
I have a very large dataframe in R and would like to sum two columns for every distinct value in other columns, for example say we had data of a dataframe of transactions in various shops over a day as follows
shop <- data.frame('shop_id' = c(1, 1, 1, 2, 3, 3),
'shop_name' = c('Shop A', 'Shop A', 'Shop A', 'Shop B', 'Shop C', 'Shop C'),
'city' = c('London', 'London', 'London', 'Cardiff', 'Dublin', 'Dublin'),
'sale' = c(12, 5, 9, 15, 10, 18),
'profit' = c(3, 1, 3, 6, 5, 9))
这是:
shop_id shop_name city sale profit
1 Shop A London 12 3
1 Shop A London 5 1
1 Shop A London 9 3
2 Shop B Cardiff 15 6
3 Shop C Dublin 10 5
3 Shop C Dublin 18 9
我想总结每家商店的销售额和利润:
And I'd want to sum the sale and profit for each shop to give:
shop_id shop_name city sale profit
1 Shop A London 26 7
2 Shop B Cardiff 15 6
3 Shop C Dublin 28 14
我目前正在使用以下代码来执行此操作:
I am currently using the following code to do this:
shop_day <-ddply(shop, "shop_id", transform, sale=sum(sale), profit=sum(profit))
shop_day <- subset(shop_day, !duplicated(shop_id))
它工作得很好,但正如我所说,我的数据框很大(140,000 行、37 列和近 100,000 个唯一行,我想求和)并且我的代码需要很长时间才能运行,然后最终说它内存不足.
which works absolutely fine, but as I said my dataframe is large (140,000 rows, 37 columns and nearly 100,000 unique rows which I want to sum) and my code takes ages to run and then eventually says it has run out of memory.
有谁知道最有效的方法.
Does anyone know of the most efficient way to do this.
提前致谢!
推荐答案
我认为最好的方法是在 dplyr
I think the neatest way to do this is in dplyr
library(dplyr)
shop %>%
group_by(shop_id, shop_name, city) %>%
summarise_all(sum)
这篇关于按 R 中的不同列值求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!