在R中：如何按组对两个日期之间的变量求和 [英] In R: how to sum a variable by group between two dates

查看：212 发布时间：2020/10/18 4:16:41 r date for-loop sum

本文介绍了在R中：如何按组对两个日期之间的变量求和的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个数据帧（DF1和DF2）：

I have two data frames (DF1 and DF2):

（1）DF1包含有关个人级别的信息，即10.000个体在11年（2000-2011年）内以30个单位嵌套。它包含四个变量：

(1) DF1 contains information on individual-level, i.e. on 10.000 individuals nested in 30 units across 11 years (2000-2011). It contains four variables:

个人（每个人的数字ID；范围为1-10.000）

单位（每个单位的数字ID；范围从1到30）

date1（日期格式的日期，即2000-01-01等）；范围从2000-01-01到2010-12-31）

date2（ Date1 +1年）

"individual" (numeric id for each individual; ranging from 1-10.000)
"unit" (numeric id for each unit; ranging from 1-30)
"date1" (a date in date format, i.e. 2000-01-01, etc; ranging from 2000-01-01 to 2010-12-31)
"date2" ("Date1" + 1 year)

（2）DF2包含有关单位级别的信息，即在同一时间段（2000-2011年）中与DF1相同的30个单位，并且还包含一个数字变量（ x）：

(2) DF2 contains information on unit-level, i.e. on the same 30 units as in DF1 across the same time period (2000-2011) and further contains a numeric variable ("x"):

单位（每个单位的数字ID；范围为1-30）

日期（日期格式，例如2000-01-01等；范围从2000-01-01到2011-12-31）

x（数字变量，范围从0到200）

我想创建新变量（ newvar）这样就为我每个单位的每个个体提供了从 date1（DF1）到 date2（DF2）的 x（DF2）的总和。这意味着我想将此新变量添加到DF1。

I would like to create new variable ("newvar") that gives me for each "individual" per "unit" the sum of "x" (DF2) counting from "date1" (DF1) to "date2" (DF2). This means that I would like to add this new variable to DF1.

例如，如果 unit = 1中的 individual = 1具有 date1 = 2000 -01-01和 date2 = 2001-01-01，并且在DF2中， unit = 1在时间段 date1至 date2（即2000-01-01至2001-01-01）中具有三个观测值），其中 x = 1， x = 2和 x = 3，那么我想添加一个新变量，以 unit = 1 newvar = 6中的 individual = 1表示。

For instance, if "individual"=1 in "unit"=1 has "date1"=2000-01-01 and "date2"=2001-01-01, and in DF2 "unit"=1 has three observations in the time period "date1" to "date2" (i.e. 2000-01-01 to 2001-01-01) with "x"=1, "x"=2 and "x"=3, then I would like add a new variable that gives for "individual"=1 in "unit"=1 "newvar"=6.

我假设我需要在R中使用for循环，并且一直在使用以下代码：

I assume that I need to use a for loop in R and have been using the following code:

for(i in length(DF1)){

DF1$newvar[i] <-sum(DF2$x[which(DF1$date == DF1$date1[i] &
                     DF1$date == DF1P$date1[i] &
                     DF2$unit == DF1P$unit[i]),])

}

但收到错误消息：

Error in DF2$x[which(DF2$date ==  : incorrect number of dimensions

关于如何创建此变量的任何想法

Any ideas of how to create this variable would be tremendously appreciated!

这里是一个小例子，也是预期的输出，为简单起见，使用一个单位：

Here is a small example as well as the expected output, using one unit for the sake of simplicity:

假设DF1如下：

individual  unit  date1        date2   
1           1     2000-01-01   2001-01-01
2           1     2000-02-02   2001-02-02
3           1     2000-03-03   2000-03-03
4           1     2000-04-04   2000-04-04
5           1     2000-12-31   2001-12-31 
(...)
996         1     2010-01-01   2011-01-01
997         1     2010-02-15   2011-02-15
998         1     2010-03-05   2011-03-05
999         1     2010-04-10   2011-04-10
1000        1     2010-12-27  2011-12-27
1001        2     2000-01-01   2001-01-01
1002        2     2000-02-02   2001-02-02
1003        2     2000-03-03   2000-03-03
1004        2     2000-04-04   2000-04-04
1005        2     2000-12-31   2001-12-31 
(...)
1996        2     2010-01-01   2011-01-01
1997        2     2010-02-15   2011-02-15
1998        2     2010-03-05   2011-03-05
1999        2     2010-04-10   2011-04-10
2000        2     2010-12-027  2011-12-27
(...)
3000        34    2000-02-02   2002-02-02
3001        34    2000-05-05   2001-05-05
3002        34    2000-06-06   2001-06-06
3003        34    2000-07-07   2001-07-07
3004        34    2000-11-11   2001-11-11
(...)
9996        34    2010-02-06   2011-02-06
9997        34    2010-05-05   2011-05-05
9998        34    2010-09-09   2011-09-09 
9999        34    2010-09-25   2011-09-25
10000       34    2010-10-15   2011-10-15

假设DF2如下：

unit      date         x
1         2000-01-01   1
1         2000-05-01   2
1         2000-12-01   3
1         2001-01-02   10
1         2001-07-05   20
1         2001-12-31   30
(...) 
2         2010-05-05   1 
2         2010-07-01   1
2         2010-08-09   1
3         (...)

这是我希望DF1运行后的外观代码：

This is what I would like DF1 to look like after running the code:

individual  unit      date1        date2        newvar  
    1           1     2000-01-01   2001-01-01   6
    2           1     2000-02-02   2001-02-02   16
    3           1     2000-03-03   2001-03-03   15
    4           1     2000-04-04   2001-04-04   15
    5           1     2000-12-31   2001-12-31   60
    (...)
    996         1     2010-01-01   2011-01-01    3
    997         1     2010-02-15   2011-02-15    2
    998         1     2010-03-05   2011-03-05    2
    999         1     2010-04-10   2011-04-10    2
    1000        1     2010-12-27  2011-12-27     0
    (...)

但是，我不能简单地汇总：想象一下我n DF1的每个单元在2000年至2011年之间每年都有数百个人。DF2的每个单元在2000-2011年期间都有很多观测值。

However, I cannot simply aggregate: Imagine that in DF1 each "unit" has several hundreds of individuals for each year between 2000 and 2011. And DF2 has many observations for each unit across the years 2000-2011.

在R中：如何按组对两个日期之间的变量求和 [英] In R: how to sum a variable by group between two dates

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在R中：如何按组对两个日期之间的变量求和 [英] In R: how to sum a variable by group between two dates

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭