使用嵌套的查找表在第二个表中查找高于阈值的值,并在R中对其进行量化 [英] Using a nested lookup table to find values above thresholds in second table and quantify them in R

查看:41
本文介绍了使用嵌套的查找表在第二个表中查找高于阈值的值,并在R中对其进行量化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R语言分析河流流量数据,并且有两个嵌套列表.首先保存来自不同河流的数据(流量测试),称为910、950、1012和1087等数字.我有数百个每日流量测量值(流量),但是当我准备年度统计数据时,确切的日期和月份并不重要.Flowtest表中的每个度量(流量)均以年份(年)为参考.

I’m analyzing river streamflow data with R language and I have two nested lists. First holds data (Flowtest) from different river reaches called numbers such as 910, 950, 1012 and 1087. I have hundreds of daily streamflow measurements (Flow), but as I’m preparing yearly statistics the exact day and month doesn’t matter. Each measurement (Flow) is referenced to a year (Year) in the Flowtest table.

Flowtest <- list("910" = tibble(Year = c(2004, 2004, 2005, 2005, 2007, 2008, 2008), Flow=c(123, 170, 187, 245, 679, 870, 820)),
                 "950" = tibble(Year = c(2004, 2005, 2005, 2005, 2006, 2008, 2008), Flow=c(570, 450, 780, 650, 230, 470, 340)),
                 "1012" = tibble(Year = c(2005, 2005, 2005, 2005, 2007, 2008, 2008), Flow=c(160, 170, 670, 780, 350, 840, 850)),
                 "1087" = tibble(Year = c(2004, 2005, 2005, 2007, 2007, 2008, 2008), Flow=c(120, 780, 820, 580, 870, 870, 840)))

第二个嵌套表称为RCHtest,用作查找表.我在与Flowtest不同的数据流数据集上计算了0.75%的百分比(Q3)的值(因此,我不想使用为Flowtest计算的Q3).因此,对于每个所分析的年份(年),我都有一个0.75%的百分位数阈值(Q3).Flowtest和RCHtest中的分析年和河段是相同的.

The second nested table called RCHtest serves as a lookup table. I calculated the value of the 0.75% percentile (Q3) on a different streamflow dataset than Flowtest (So I don’t want to use Q3 calculated for Flowtest). So I have a value of the 0.75% percentile threshold (Q3) for each of the analyzed years (Years). Analyzed years and river reaches are the same in Flowtest and RCHtest.

RCHtest <- list("910" = data.frame(Year = c(2004:2008), Q3=c(650, 720, 550, 580, 800)),
                "950" = data.frame(Year = c(2004:2008), Q3=c(550, 770, 520, 540, 790)),
                "1012" = data.frame(Year = c(2004:2008), Q3=c(600, 780, 500, 570, 800)),
                "1087" = data.frame(Year = c(2004:2008), Q3=c(670, 790, 510, 560, 780)))

我想从Flowtest $ Flow中获得的值的数量超过每个子流域每年RCHtest $ Q3中指定的阈值,如下所示Resulttest.

What I would like to obtain is the quantity of values from Flowtest$Flow which fall above the threshold specified in RCHtest$Q3 per Year, per subbasin as shown in Resulttest below.

Resulttest <- list("910" = data.frame(Year = c(2004:2008), aboveQ3=c(0, 0, 0, 1, 2)),
                  "950" = data.frame(Year = c(2004:2008), aboveQ3=c(1, 1, 0, 0, 0)),
                  "1012" = data.frame(Year = c(2004:2008), aboveQ3=c(0, 2, 0, 0, 2)),
                  "1087" = data.frame(Year = c(2004:2008), aboveQ3=c(0, 1, 0, 2, 2)))

该如何处理?请帮助

推荐答案

您可以将 Map aggregate 组合使用:

Map(function(x, y) aggregate(Flow > Q3~Year, merge(x, y, all = TRUE,
          na.action = 'na.pass'), sum, na.rm = TRUE, na.action = 'na.pass'), 
          Flowtest, RCHtest)

这将返回:

#$`910`
#  Year Flow > Q3
#1 2004         0
#2 2005         0
#3 2006         0
#4 2007         1
#5 2008         2

#$`950`
#  Year Flow > Q3
#1 2004         1
#2 2005         1
#3 2006         0
#4 2007         0
#5 2008         0

#$`1012`
#  Year Flow > Q3
#1 2004         0
#2 2005         0
#3 2006         0
#4 2007         0
#5 2008         2

#$`1087`
#  Year Flow > Q3
#1 2004         0
#2 2005         1
#3 2006         0
#4 2007         2
#5 2008         2


如果要使用 tidyverse 函数执行此操作,则可以执行以下操作:


If you want to do this using tidyverse functions you can do :

library(dplyr)
library(purrr)

map2(Flowtest, RCHtest, ~full_join(.x, .y) %>%
                          group_by(Year) %>%
                          summarise(sum = sum(Flow > Q3, na.rm = TRUE)))

这篇关于使用嵌套的查找表在第二个表中查找高于阈值的值,并在R中对其进行量化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆