对于()循环到ID之间的日期之间,并计算平均值 [英] For() loop to ID dates that are between others and calculate a mean value

查看:133
本文介绍了对于()循环到ID之间的日期之间,并计算平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是R:For()循环检查日期是否在单独对象中的两个日期之间的重新发布,已经改变为在Henrik和Metrics的建议之后包含模拟/测试最小化。感谢他们。

我有两个大数据集,都包含日期/时间字段列。我的第一个数据集有一个日期,第二个有两个日期。总之,我试图找到第一个数据集中的第二个日期之间的所有日期,然后找到一个平均值。为了提供清晰度,我创建了一个使用值而不是日期的模拟最小数据集。

我的第一个模拟数据集的head()如下 - 以及dput()输出。这些数据是特定于IndID列中记录的个人的。

  IndID MockDate RandNumber 
1 1 5 1.862084
2 1 3 1.103154
3 1 5 1.373760
4 1 1 1.497397
5 1 1 1.319488
6 1 3 2.120354

actData <结构(列表(IndID = c(1L,1L,1L,1L,1L,1L,1L,1L,2L,
2L,2L,2L,2L,2L,2L,2L,2L,2L,2L, 2L,3L,3L,3L,3L,3L,
3L,3L,3L,3L,3L),MockDate = c(5L,3L,5L,1L,1L,3L,4L, ,2L,5L,2L,1L,5L,3L,5L,3L,5L,3L,5L,1L,5L,3L,5L,
5L,2L,3L,1L,4L,3L,3L) RandNumber = c(1.862083679,1.103154127,
1.37376001,1.497397482,1.319487885,2.120353884,1.895660195,
1.150411874,2.61036961,1.99354158,1.547706758,1.941501873,
1.739226419,2.455590044,2.907382515,2.110502618,2.076187012,
2.507527308,2.167657681,1.662405916,2.428807116,2.04699653,
1.937335768,1.456518889, 1.948952907,2.104325112,2.311519732,
2.092650229,2.109051215,2.089144475)),.Names = c(IndID,
MockDate,RandNumber),class =data.frame,行。 (),
-30L)

模拟数据集在 - 以及输出()输出。

  IndID StartTime EndTime 
1 1 4 5
2 1 7 11
3 1 6 9
4 1 7 9
5 1 6 10
6 1 2 12

clstrData< ;结构(列表(IndID.1 = c(1L,1L,1L,1L,1L,1L,2L,2L,2L,
2L,2L,2L,2L,3L,3L,3L,3L, 3L,3L),StartTime = c(4L,7L,
6L,7L,6L,2L,6L,4L,3L,5L,2L,5L,7L,3L,4L,3L,2L,5L, (5L,11L,9L,9L,10L,12L,8L,13L,5L,13L,
9L,9L,17L,6L,8L,6L,9L,15L, 7L)),.Names = c(IndID,
StartTime,EndTime),row.names = c(NA,19L),class =data.frame)

第二个数据集有两个数字字段,代表开始和结束时间。如上所述,这些数据也是特定于IndD列的个人。



我需要为所有实例的数据集1中的RandNumber取平均值,当'MockDate '在每个唯一IndID的第二个数据集的'StartTime'和'EndTime'之间。因此,'RandNumber'值只能在以下情况下进行平均:1)它们在StartTime和EndTime之内2)两行的IndID相同。

如果MockDate介于StartTime和EndTime之间,我开始创建一个ID函数。

  is.between x>一个& x < b 





$ b

测试该函数对单个值
is.between (actData [1,3],clstrData [,2],clstrData [,3])

但是无法弄清楚如何循环所有行,然后查找均值。

  YesNo<  -  list()
for(i in 1: (actData [1,3],clstrData [,2],clstrData [,3])
} $ b $对于(),所有行都给出了相同的结果...
$ / code>

p>

希望创建...
clstrData $ NEWcolum< - 每行的平均值RandNum



谢谢,并一如既往的建议非常感谢!

解决方案

感谢Ricardo Saporta提前想到。

在我的for()循环中构造一个长条件对我来说是最好的选择 - 尽管不如data.table()快。

使用上面的数据,代码下面是我最终构建的。



pre $ code $ clstrData $ meanAct = rep(NA,nrow(clstrData))

for(i in 1 :nrow(clstrData)){
clstrData $ meanAct [i] = mean(actData $ RandNumber [actData $ IndID == clstrData $ IndID [i]
& is.between(actData $ RandNumber,clstrData $ StartTime [i],clstrData $ EndTime [i])])
}
head(clstrData)
tail(clstrData)

如果开始时间和结束时间之间没有相应的值,则生成NAN。


This is a re-post of "R: For() loop checking if date is between two dates in separate object", that has been changed to incorporate a mock/test minimal after the suggestions of Henrik and Metrics. Thanks to them.

I have two large datasets, both contain columns of date/time fields. My first dataset has a single date, the second has two dates. In short I am trying to find all dates from the first data set that are between the other two dates of the second and then find an average value. In order to provide clarity, I have created a mock minimal data set using values rather than dates.

The head() of my first mock data set is below – as well as the dput() output. The data is specific to an individual noted by the IndID column.

  IndID MockDate RandNumber
1     1        5   1.862084
2     1        3   1.103154
3     1        5   1.373760
4     1        1   1.497397
5     1        1   1.319488
6     1        3   2.120354

actData <- structure(list(IndID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L), MockDate = c(5L, 3L, 5L, 1L, 1L, 3L, 4L, 
2L, 2L, 5L, 2L, 1L, 5L, 3L, 5L, 3L, 5L, 3L, 5L, 1L, 5L, 3L, 5L, 
5L, 2L, 3L, 1L, 4L, 3L, 3L), RandNumber = c(1.862083679, 1.103154127, 
1.37376001, 1.497397482, 1.319487885, 2.120353884, 1.895660195, 
1.150411874, 2.61036961, 1.99354158, 1.547706758, 1.941501873, 
1.739226419, 2.455590044, 2.907382515, 2.110502618, 2.076187012, 
2.507527308, 2.167657681, 1.662405916, 2.428807116, 2.04699653, 
1.937335768, 1.456518889, 1.948952907, 2.104325112, 2.311519732, 
2.092650229, 2.109051215, 2.089144475)), .Names = c("IndID", 
"MockDate", "RandNumber"), class = "data.frame", row.names = c(NA, 
-30L))

The head() of my 2nd mock data set is below – as well as the dput() output.

 IndID StartTime EndTime
1     1         4       5
2     1         7      11
3     1         6       9
4     1         7       9
5     1         6      10
6     1         2      12

clstrData <- structure(list(IndID.1 = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), StartTime = c(4L, 7L, 
6L, 7L, 6L, 2L, 6L, 4L, 3L, 5L, 2L, 5L, 7L, 3L, 4L, 3L, 2L, 5L, 
5L), EndTime = c(5L, 11L, 9L, 9L, 10L, 12L, 8L, 13L, 5L, 13L, 
9L, 9L, 17L, 6L, 8L, 6L, 9L, 15L, 7L)), .Names = c("IndID", 
"StartTime", "EndTime"), row.names = c(NA, 19L), class = "data.frame")

The second dataset has two number fields representing a start and end time. As above, these data are also specific to an individual noted by the IndD column.

I need to average the ‘RandNumber’ from dataset one for all the instances when ‘MockDate’ is between ‘StartTime’ and ‘EndTime’ of the second dataset for each unique IndID. Thus, ‘RandNumber’ values should only be averaged if 1) they are within the ‘StartTime’ and ‘EndTime’ and 2) the IndID for both rows are the same.

I started by creating a function to ID if MockDate is between StartTime and EndTime

is.between <- function(x, a, b) {
    x > a & x < b
}

Testing that function works for a single value is.between(actData[1,3], clstrData[,2], clstrData[,3])

But cannot figure out how to loop this for all rows, and then find the mean. My for() loop beginnings are below.

YesNo <- list()
for (i in 1:nrow(actData)) {
YesNo[[i]] <- is.between(actData[1,3], clstrData[,2], clstrData[,3])
}
YesNo[[3]]

This for() gives the same result for all row…

Hope to create... clstrData$NEWcolum <- mean RandNum for each row.

Thanks, and as always any suggestions are greatly appreciated!

解决方案

Thanks to Ricardo Saporta for earlier thoughts.

However, constructing a long conditional in my for() loop was the best option for me - although not as fast as data.table().

Using the data above, the code below is what I ended up constructing.

clstrData$meanAct = rep(NA, nrow(clstrData))

for (i in 1:nrow(clstrData)){
    clstrData$meanAct[i] = mean(actData$RandNumber[actData$IndID==clstrData$IndID[i]
    &is.between(actData$RandNumber, clstrData$StartTime[i], clstrData$EndTime[i])])
    }
head(clstrData)
tail(clstrData)

Were there is no corresponding value between the Start and End times, NAN's are produced.

这篇关于对于()循环到ID之间的日期之间,并计算平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆