如何匹配R中2个数据框中的日期,然后求和该日期之前的特定值范围? [英] How to match dates in 2 data frames in R, then sum specific range of values up to that date?

查看:45
本文介绍了如何匹配R中2个数据框中的日期,然后求和该日期之前的特定值范围?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框:每天收集一次降雨数据和不规则收集水样中的硝酸盐浓度,大约每月一次.我想为每个硝酸盐浓度创建一个值向量,该值是前5天降雨的总和.基本上,我需要将硝酸盐日期与降雨日期相匹配,将前5天的总和相匹配天的降雨量,然后打印出含硝酸盐数据的总和.

我认为我需要做一个 function ,一个 for 循环,或者使用 tapply 来做到这一点,但是我不需要知识.尽管我在简单的案例中使用了它们,但我都不是其中的专家.我搜索了类似的帖子,但没有一个确切地了解.这个处理按因子组求和.用于对每对可能的行进行求和.涉及聚合.

以下是2个示例数据帧:

 #降雨df毫米<-c(0,0,0,0,5,0,0,2,0,0,10,0,0,0,0)日期<-c(1:15)雨<-data.frame(cbind(mm,date))#b/c降雨总和取决于正确的时间顺序,请确保数据按日期排序.下雨[do.call(order,list(rain $ date)),]#硝酸盐dfnconc<-c(15、12、14、20、8.5)#硝酸盐浓度ndate<-c(6,8,11,13,14)硝酸盐<-data.frame(cbind(nconc,ndate)) 

我希望有一种方法可以找到每次硝酸盐测量的匹配降雨日期,例如:

  match(nitrate $ date [i]%in%rain $ date) 

(注意: match 是否可以与 as.Date 日期一起使用?)然后求和前5天的降雨量(不包括测量日期),例如:

  sum(rain $ mm [j-6:j-1] 

并在硝酸盐的新列中打印总和

  print(nitrate $ mm_sum [i]) 

为确保清楚我要寻找的结果,以下是手动"进行计算的方法.第一个硝酸盐浓度是在第6天收集的,因此第1-5天的降雨量总和为5mm.

非常感谢.

解决方案

您或多或少都在那儿!

  nitrate $ prev_five_rainfall =不适用对于(i in 1:length(nitrate $ ndate)){天=硝酸盐$ ndate [i]硝酸盐$ prev_five_rainfall [i] =总和(雨$ mm [(第6天::(第1天)]))} 

分步说明:

初始化空结果列:

  nitrate $ prev_five_rainfall =不适用 

对于硝酸盐df中的每一行:(i = 1,2,3,4,5)

  for(i in 1:length(nitrate $ ndate)){ 

抓住我们想要最终结果的那一天:

 天=硝酸盐$ ndate [i] 

取全部金额并将其放入结果列

 硝酸盐$ prev_five_rainfall [i] =总和(rain $ mm [(day-6):( day-1)]) 

关闭for循环:)

 } 


免责声明:此答案的基本之处在于:

  • 如果硝酸盐的ndate<6
  • 如果降雨数据框中缺少某些日期,将是不正确的
  • 处理较大的数据会很慢

随着您对R的了解越来越多,可以对这些类型的操纵使用诸如 dplyr data.table 之类的数据操纵包.

I have two data frames: rainfall data collected daily and nitrate concentrations of water samples collected irregularly, approximately once a month. I would like to create a vector of values for each nitrate concentration that is the sum of the previous 5 days' rainfall. Basically, I need to match the nitrate date with the rain date, sum the previous 5 days' rainfall, then print the sum with the nitrate data.

I think I need to either make a function, a for loop, or use tapply to do this, but I don't know how. I'm not an expert at any of those, though I've used them in simple cases. I've searched for similar posts, but none get at this exactly. This one deals with summing by factor groups. This one deals with summing each possible pair of rows. This one deals with summing by aggregate.

Here are 2 example data frames:

# rainfall df
mm<- c(0,0,0,0,5, 0,0,2,0,0, 10,0,0,0,0)
date<- c(1:15)
rain <- data.frame(cbind(mm, date))
# b/c sums of rainfall depend on correct chronological order, make sure the data are in order by date.
rain[ do.call(order, list(rain$date)),] 

# nitrate df 
nconc <- c(15, 12, 14, 20, 8.5) # nitrate concentration
ndate<- c(6,8,11,13,14)
nitrate <- data.frame(cbind(nconc, ndate))

I would like to have a way of finding the matching rainfall date for each nitrate measurement, such as:

match(nitrate$date[i] %in% rain$date)

(Note: Will match work with as.Date dates?) And then sum the preceding 5 days' rainfall (not including the measurement date), such as:

sum(rain$mm[j-6:j-1]

And prints the sum in a new column in nitrate

print(nitrate$mm_sum[i])

To make sure it's clear what result I'm looking for, here's how to do the calculation 'by hand'. The first nitrate concentration was collected on day 6, so the sum of rainfall on days 1-5 is 5mm.

Many thanks in advance.

解决方案

You were more or less there!

nitrate$prev_five_rainfall = NA 
for (i in 1:length(nitrate$ndate)) { 
    day = nitrate$ndate[i] 
    nitrate$prev_five_rainfall[i] = sum(rain$mm[(day-6):(day-1)])
}

Step by step explanation:

Initialize empty result column:

nitrate$prev_five_rainfall = NA 

For each line in the nitrate df: (i = 1,2,3,4,5)

for (i in 1:length(nitrate$ndate)) { 

Grab the day we want final result for:

    day = nitrate$ndate[i] 

Take the rainfull sum and it put in in the results column

    nitrate$prev_five_rainfall[i] = sum(rain$mm[(day-6):(day-1)])

Close the for loop :)

}


Disclaimer: This answer is basic in that:

  • It will break if nitrate's ndate < 6
  • It will be incorrect if some dates are missing in the rain dataframe
  • It will be slow on larger data

As you get more experience with R, you might use data manipulation packages like dplyr or data.table for these types of manipulations.

这篇关于如何匹配R中2个数据框中的日期,然后求和该日期之前的特定值范围?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆