合并行并求和它们的值 [英] Combine rows and sum their values
问题描述
下面是我原始数据帧的一小部分。我需要合并在特定季节重复特定 id
的行以及其 lic
和 vessel
是不同的。通过组合,我需要总和 qtty
和 grossTon
。
Below there is a fraction of my original data frame. I need to combine those rows in which a specific id
is repeated in a specific season and its lic
and vessel
are different. By combining I need sum qtty
and grossTon
.
请在季节1998
中使用 id 431
season lic id qtty vessel grossTon
…
1998 16350 431 40 435 57
1998 16353 431 28 303 22.54
…
同一主题 431
有两个不同的 lic
(16350& 16353)和两个不同的船只
(435和303)。在此特定情况下预期的结果是:
The same subject 431
has two different lic
(16350 & 16353) and two different vessels
(435 & 303). The result expected in this specific case is:
season lic id qtty vessel grossTon
…
1998 16350 431 68 435 79.54
…
我不介意哪个 lic
和 vessel
在结果行中提醒,我要保持季节
, id
以及 qtty
和 grossTon
的总和。在上面的示例中,我手动选择了 lic 16350
和 vessel 435
。
I don't mind which lic
and vessel
remind in the resulting row, what I want is to keep season
, id
and the resulting sum of qtty
and grossTon
. In the above example I manually chose lic 16350
and vessel 435
.
说实话,我不知道该怎么做,所以我非常感谢您的帮助。
To be honest I have no idea what to do, so I'd much appreciate any help.
谢谢
原始数据(* =要合并的行)
Original data (*= rows to be combined)
season lic id qtty vessel grossTon
1998 15593 411 40 2643 31.5
1999 27271 411 40 2643 31.5
2000 35758 411 40 2643 31.5
2001 45047 411 50 2643 31.5
2002 56291 411 55 2643 31.5
2003 66991 411 55 2643 31.5
2004 80581 411 55 2643 31.5
2005 95058 411 52 NA NA
2006 113379 411 50 10911 4.65
2007 120894 411 50 10911 4.65
2008 130033 411 50 2483 8.5
2009 139201 411 46 2296 50
2010 148833 411 46 2296 50
2011 158395 411 46 2296 50
1998 16350 431 40 435 57 # *
1998 16353 431 28 303 22.54 # *
2000 37491 436 50 2021 19.11
2001 47019 436 50 2021 19.11
2002 57588 436 51 2021 19.11
2003 69128 436 51 2021 19.11
2004 82400 436 52 2021 19.11
2005 95599 436 50 2021 19.11
2006 113126 436 50 2021 19.11
2007 122387 436 50 2021 19.11
2008 131126 436 50 2021 19.11
2009 140417 436 50 2021 19.11
2010 150673 436 50 2021 19.11
2011 159776 436 50 2021 19.11
我还需要保留前一行和后一行仅有一个 id $ c $的行c>每季。像这样:(* =合并后的行数)
Also I need keep those previous and following rows that have just one id
per season. Like this: (*=row resulting after being combined)
season lic id qtty vessel grossTon
1998 15593 411 40 2643 31.5
1999 27271 411 40 2643 31.5
2000 35758 411 40 2643 31.5
2001 45047 411 50 2643 31.5
2002 56291 411 55 2643 31.5
2003 66991 411 55 2643 31.5
2004 80581 411 55 2643 31.5
2005 95058 411 52 NA NA
2006 113379 411 50 10911 4.65
2007 120894 411 50 10911 4.65
2008 130033 411 50 2483 8.5
2009 139201 411 46 2296 50
2010 148833 411 46 2296 50
2011 158395 411 46 2296 50
1998 16350 431 68 435 79.54 #*
2000 37491 436 50 2021 19.11
2001 47019 436 50 2021 19.11
2002 57588 436 51 2021 19.11
2003 69128 436 51 2021 19.11
2004 82400 436 52 2021 19.11
2005 95599 436 50 2021 19.11
2006 113126 436 50 2021 19.11
2007 122387 436 50 2021 19.11
2008 131126 436 50 2021 19.11
2009 140417 436 50 2021 19.11
2010 150673 436 50 2021 19.11
2011 159776 436 50 2021 19.11
推荐答案
如果您打开数据.frame成数据表,您可以充分利用 by
参数
If you turn your data.frame into a data.table you can make great use of the by
argument
library(data.table)
DT <- data.table(DF) # DF is your original data
然后它只是一行:
DT[, lapply(.SD, sum), by=list(season, lic, id, vessel)]
我们只能过滤 1998
Season
,如果我们想要的话:'
We can filter just the 1998
Season
, if we'd like: '
DT[, lapply(.SD, sum), by=list(season, lic, id, vessel)][season==1998]
season lic id vessel qtty grossTon
1: 1998 15593 411 2643 40 31.50
2: 1998 16350 431 435 68 114.00
3: 1998 16353 431 303 68 45.08
整个结果输出看起来像
season lic id vessel qtty grossTon
1: 1998 15593 411 2643 40 31.50
2: 1999 27271 411 2643 40 31.50
3: 2000 35758 411 2643 40 31.50
4: 2001 45047 411 2643 50 31.50
5: 2002 56291 411 2643 55 31.50
6: 2003 66991 411 2643 55 31.50
7: 2004 80581 411 2643 55 31.50
8: 2005 95058 411 NA 52 NA
9: 2006 113379 411 10911 50 4.65
10: 2007 120894 411 10911 50 4.65
11: 2008 130033 411 2483 50 8.50
12: 2009 139201 411 2296 46 50.00
13: 2010 148833 411 2296 46 50.00
14: 2011 158395 411 2296 46 50.00
15: 1998 16350 431 435 68 114.00
16: 1998 16353 431 303 68 45.08
17: 1999 28641 431 303 68 45.08
18: 1999 28644 431 435 68 114.00
19: 2000 37491 436 2021 50 19.11
20: 2001 47019 436 2021 50 19.11
21: 2002 57588 436 2021 51 19.11
22: 2003 69128 436 2021 51 19.11
23: 2004 82400 436 2021 52 19.11
24: 2005 95599 436 2021 50 19.11
25: 2006 113126 436 2021 50 19.11
26: 2007 122387 436 2021 50 19.11
27: 2008 131126 436 2021 50 19.11
28: 2009 140417 436 2021 50 19.11
29: 2010 150673 436 2021 50 19.11
30: 2011 159776 436 2021 50 19.11
season lic id vessel qtty grossTon
这篇关于合并行并求和它们的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!