如何匹配R中2个数据框中的日期,然后求和该日期之前的特定值范围? [英] How to match dates in 2 data frames in R, then sum specific range of values up to that date?
问题描述
我认为我需要做一个 function
,一个 for
循环,或者使用 tapply
来做到这一点,但是我不需要知识.尽管我在简单的案例中使用了它们,但我都不是其中的专家.我搜索了类似的帖子,但没有一个确切地了解.这个处理按因子组求和.此用于对每对可能的行进行求和.此涉及聚合
.
以下是2个示例数据帧:
#降雨df毫米<-c(0,0,0,0,5,0,0,2,0,0,10,0,0,0,0)日期<-c(1:15)雨<-data.frame(cbind(mm,date))#b/c降雨总和取决于正确的时间顺序,请确保数据按日期排序.下雨[do.call(order,list(rain $ date)),]#硝酸盐dfnconc<-c(15、12、14、20、8.5)#硝酸盐浓度ndate<-c(6,8,11,13,14)硝酸盐<-data.frame(cbind(nconc,ndate))
我希望有一种方法可以找到每次硝酸盐测量的匹配降雨日期,例如:
match(nitrate $ date [i]%in%rain $ date)
(注意: match
是否可以与 as.Date
日期一起使用?)然后求和前5天的降雨量(不包括测量日期),例如:
sum(rain $ mm [j-6:j-1]
并在硝酸盐的新列中打印总和
print(nitrate $ mm_sum [i])
为确保清楚我要寻找的结果,以下是手动"进行计算的方法.第一个硝酸盐浓度是在第6天收集的,因此第1-5天的降雨量总和为5mm.
非常感谢.
您或多或少都在那儿!
nitrate $ prev_five_rainfall =不适用对于(i in 1:length(nitrate $ ndate)){天=硝酸盐$ ndate [i]硝酸盐$ prev_five_rainfall [i] =总和(雨$ mm [(第6天::(第1天)]))}
分步说明:
初始化空结果列:
nitrate $ prev_five_rainfall =不适用
对于硝酸盐df中的每一行:(i = 1,2,3,4,5)
for(i in 1:length(nitrate $ ndate)){
抓住我们想要最终结果的那一天:
天=硝酸盐$ ndate [i]
取全部金额并将其放入结果列
硝酸盐$ prev_five_rainfall [i] =总和(rain $ mm [(day-6):( day-1)])
关闭for循环:)
}
免责声明:此答案的基本之处在于:
- 如果硝酸盐的ndate<6
- 如果降雨数据框中缺少某些日期,将是不正确的
- 处理较大的数据会很慢
随着您对R的了解越来越多,可以对这些类型的操纵使用诸如 dplyr
或 data.table
之类的数据操纵包.
I have two data frames: rainfall data collected daily and nitrate concentrations of water samples collected irregularly, approximately once a month. I would like to create a vector of values for each nitrate concentration that is the sum of the previous 5 days' rainfall. Basically, I need to match the nitrate date with the rain date, sum the previous 5 days' rainfall, then print the sum with the nitrate data.
I think I need to either make a function
, a for
loop, or use tapply
to do this, but I don't know how. I'm not an expert at any of those, though I've used them in simple cases. I've searched for similar posts, but none get at this exactly. This one deals with summing by factor groups. This one deals with summing each possible pair of rows. This one deals with summing by aggregate
.
Here are 2 example data frames:
# rainfall df
mm<- c(0,0,0,0,5, 0,0,2,0,0, 10,0,0,0,0)
date<- c(1:15)
rain <- data.frame(cbind(mm, date))
# b/c sums of rainfall depend on correct chronological order, make sure the data are in order by date.
rain[ do.call(order, list(rain$date)),]
# nitrate df
nconc <- c(15, 12, 14, 20, 8.5) # nitrate concentration
ndate<- c(6,8,11,13,14)
nitrate <- data.frame(cbind(nconc, ndate))
I would like to have a way of finding the matching rainfall date for each nitrate measurement, such as:
match(nitrate$date[i] %in% rain$date)
(Note: Will match
work with as.Date
dates?) And then sum the preceding 5 days' rainfall (not including the measurement date), such as:
sum(rain$mm[j-6:j-1]
And prints the sum in a new column in nitrate
print(nitrate$mm_sum[i])
To make sure it's clear what result I'm looking for, here's how to do the calculation 'by hand'. The first nitrate concentration was collected on day 6, so the sum of rainfall on days 1-5 is 5mm.
Many thanks in advance.
You were more or less there!
nitrate$prev_five_rainfall = NA
for (i in 1:length(nitrate$ndate)) {
day = nitrate$ndate[i]
nitrate$prev_five_rainfall[i] = sum(rain$mm[(day-6):(day-1)])
}
Step by step explanation:
Initialize empty result column:
nitrate$prev_five_rainfall = NA
For each line in the nitrate df: (i = 1,2,3,4,5)
for (i in 1:length(nitrate$ndate)) {
Grab the day we want final result for:
day = nitrate$ndate[i]
Take the rainfull sum and it put in in the results column
nitrate$prev_five_rainfall[i] = sum(rain$mm[(day-6):(day-1)])
Close the for loop :)
}
Disclaimer: This answer is basic in that:
- It will break if nitrate's ndate < 6
- It will be incorrect if some dates are missing in the rain dataframe
- It will be slow on larger data
As you get more experience with R, you might use data manipulation packages like dplyr
or data.table
for these types of manipulations.
这篇关于如何匹配R中2个数据框中的日期,然后求和该日期之前的特定值范围?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!