R:如果键值相同,以及如果键在连续行中重复,则如何对行中的值求和? [英] R: How to sum values from rows only if the key value is the same and also if the key duplicated in consecutive rows?

查看:116
本文介绍了R:如果键值相同,以及如果键在连续行中重复,则如何对行中的值求和?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下这种结构的数据表:

I have the following data table of this structure:

+-------------------+
| id  | key | value |
+-----+-----+-------+
| 1   | A   | 1000  |
| 1   | A   | 2000  |
| 1   | B   | 2001  |
| 1   | A   | 2002  |
| 1   | A   | 2004  |
| 2   | B   | 2002  |
| 2   | C   | 2002  |
+-------------------+


$ b b

我的目标是通过id和key求和,而不是通过id和key进行分组,我只想对这些值进行求和,如果id和key对连续的行是相同的。

My objective is to sum the values by id and key but instead of just grouping by id and key, I only want to sum the values if the pair of id and key are the same for consecutive rows.

结果应为:

+-------------------+
| id  | key | value |
+-----+-----+-------+
| 1   | A   | 3000  |
| 1   | B   | 2001  |
| 1   | A   | 4006  |
| 2   | B   | 2002  |
| 2   | C   | 2002  |
+-------------------+


$ b

Is there anyway to achieve this result?

推荐答案

我们可以使用 rleid devel 版本 data.table 即v1.9.5 +。安装devel版本的说明为 此处

We convert the 'data.frame' to 'data.table'. Create another grouping column 'ind' from the 'key' column. Grouped by 'id' and 'ind', we get the sum of 'value' and get the first element of 'key'. We can assign the 'ind' to NULL as it is not needed in the expected output.

我们将data.frame转换为data.table。从键列创建另一个分组列ind。通过'id'和'ind'分组,我们得到'value'的 sum ,并获得'key'的第一个元素。我们可以将ind赋给NULL,因为它在预期输出中不需要。

library(data.table) setDT(df1)[,list(value = sum(value), key=key[1L]), by = .(ind=rleid(key), id)][, ind:=NULL][] # id value key #1: 1 3000 A #2: 1 2001 B #3: 1 4006 A #4: 2 2002 B #5: 2 2002 C

Or as @Frank suggested, we can use mutliple columns within rleid, use it as grouping variable, get the first element of other variables and sum of 'value',  assign the unwanted column to NULL or use the standard data.table subsetting using with = FALSE.

或者作为@Frank建议,我们可以使用mutliple rleid 中的列,将其用作分组变量,获取其他变量的第一个元素以及value的 sum 将不需要的列分配给 NULL ,或使用 with = FALSE使用标准 data.table

setDT(df1)[, list(id=id[1L], key=key[1L], value=sum(value)) , by = .(r=rleid(id, key))][, r:= NULL][] # id key value #1: 1 A 3000 #2: 1 B 2001 #3: 1 A 4006 #4: 2 B 2002 #5: 2 C 2002




Or we can use dplyr. We create the grouping variable 'ind' by comparing the adjacent elements of 'key', and get the sum of 'value' and the first element of 'key' with summarise.






或者我们可以使用 dplyr 。我们通过比较'key'的相邻元素创建分组变量'ind',并获得'value'的 sum code>元素'key'与 summarize

NOTE: In the dplyr and data.table we can also place the 'key' column as the grouping variable and remove the key=key[1L] or key=first(key)).

注意:在 dplyr data.table ,我们还可以将key列作为分组变量,并删除 = key [1L] key = first(key))

Or we transform the dataset by creating the 'ind' column and use aggregate from base R to get the expected output

或者我们通过创建'ind'列来转换数据集,并使用 aggregate base R 获得预期输出

df1 <- transform(df1, ind = cumsum(c(TRUE,head(key,-1)!=tail(key,-1)))) aggregate(value~., df1, FUN=sum)[-3] # id key value #1 1 A 3000 #2 1 B 2001 #3 1 A 4006 #4 2 B 2002 #5 2 C 2002



data



data



df1 <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L), key = c("A", "A", "B", "A", "A", "B", "C"), value = c(1000L, 2000L, 2001L, 2002L, 2004L, 2002L, 2002L)), .Names = c("id", "key", "value" ), class = "data.frame", row.names = c(NA, -7L))

这篇关于R:如果键值相同,以及如果键在连续行中重复,则如何对行中的值求和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆