在同一数据帧中汇总具有不同ID的总和 [英] Aggregate sum obs with different ID's in the same data frame

查看:98
本文介绍了在同一数据帧中汇总具有不同ID的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是通过使用日期将当前天的观察值与同一ID的所有先前观察值相加来创建另一列(数据集按date和chr nr(ID)排序.我将需要汇总出现新的"id"时重新开始.

My goal is to make another column by summing the observation from the present day and all previous observations from the same ID by using the date (the data set is sorted in date and chr nr(ID). I will need the aggregation to start over when a new "id" is presented.

可能有som NA,应将它们视为null

there might be som NA's, they should be considered as null

"Doseringer_pr_kg_dyr"是观察值.

"Doseringer_pr_kg_dyr" is the observation.

CHR_NR    DATO_AFSLUT    Doseringer_pr_kg_dyr    brugstid 
10358    2018-08-06    29416.67    31   
10358    2018-09-06    104682.27    36 
10358    2018-10-12    10333.33    26   
10358    2018-11-07    10090.91    27   
10358    2018-12-04    8000.00    NA   
13168    2012-01-23    12042.25    2   
13168    2012-01-25    9000.00    42 
13168    2012-03-07    44450.70    19
13168    2012-03-26    35000.00    37 
13168    2012-05-02    93478.26    70

我希望会出现一些问题

CHR_NR    DATO_AFSLUT    Doseringer_pr_kg_dyr    brugstid     sum
10358  2018-11-07    10090.91    27    [108,6]+[109,3]  
10358  2018-12-04    8000.00    NA    [109,6]+[110,3]
13168  2012-01-23    12042.25    2    [111,3]
13168  2012-01-25    9000.00    42    [111,6]+[112,3]
13168  2012-03-07    44450.70    19    [112,6]+[113,3]

其中[行,列] + [行,列]在新列中相加.

where [row, column] + [row, column] is summed in the new column.

我想到了apply函数之一 或类似这样的循环:

i thought of one of the apply functions or a loop like:

agg<-function(dat) {
  last_row <- 0
  for ( row in dat ) {
    if ( row[1] == last_row[1] ) {
      row[6] <- last_row[6] + row[3]
    } else { 
      row[6] <- row[3]
    }
    last_row <- row
  }
}

从评论中:

# dput(head(a)) 
a <- structure(list(CHR_NR = c(10358, 10358, 10358, 10358, 10358, 10358), 
  DATO_AFSLUT = structure(c(15349, 15387, 15426, 15441, 15455, 15476), 
  Level = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real ), 
  Doseringer_pr_kg_dyr = c(276152.688936271, 161616.695196052, 127549.514333096, 13333.3333333333, 86255.3897180763, 31034.1151385928 ), 
  brugstid = c(38, 39, 15, 14, 21, 15), 
  i = c(7267.17602463871, 4144.01782553979, 8503.30095553976, 952.380952380952, 4107.39951038459, 2068.94100923952)), 
  row.names = 6:11, class = "data.frame")

推荐答案

a$Doseringer_pr_kg_dyr[is.na(a$Doseringer_pr_kg_dyr)]<-0 a$x<-ave(a$Doseringer_pr_kg_dyr,a$CHR_NR,FUN = cumsum)

a$Doseringer_pr_kg_dyr[is.na(a$Doseringer_pr_kg_dyr)]<-0 a$x<-ave(a$Doseringer_pr_kg_dyr,a$CHR_NR,FUN = cumsum)

其中x是累积值,"ave"将CHR中的累积值分组

where x is the cummulated and "ave" groups the accumulation in CHR

这篇关于在同一数据帧中汇总具有不同ID的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆