如果满足条件,如何对连续行进行子集化 [英] How to subset consecutive rows if they meet a condition
问题描述
我正在使用 R 来分析一些包含每日最高和最低温度值的时间序列 (1951-2013).数据具有以下结构:
I am using R to analyze a number of time series (1951-2013) containing daily values of Max and Min temperatures. The data has the following structure:
YEAR MONTH DAY MAX MIN
1985 1 1 22.8 9.4
1985 1 2 28.6 11.7
1985 1 3 24.7 12.2
1985 1 4 17.2 8.0
1985 1 5 17.9 7.6
1985 1 6 17.7 8.1
我需要根据此定义找到热浪的频率:连续三天或更多天的时间段,每日最高和最低温度超过最高温度的第 90 个百分位数,以及研究中所有天的最低温度期间.
I need to find the frequency of heat waves based on this definition: A period of three or more consecutive days with a daily maximum and minimum temperature exceeding the 90th percentile of the maximum and minimum temperatures for all days in the studied period.
基本上,当最高和最低温度超过阈值时,我想对连续几天(三天或更多)进行子集化.输出将是这样的:
Basically, I want to subset those consecutive days (three or more) when the Max and Min temp exceed a threshold value. The output would be something like this:
YEAR MONTH DAY MAX MIN
1989 7 18 45.0 23.5
1989 7 19 44.2 26.1
1989 7 20 44.7 24.4
1989 7 21 44.6 29.5
1989 7 24 44.4 31.6
1989 7 25 44.2 26.7
1989 7 26 44.5 25.0
1989 7 28 44.8 26.0
1989 7 29 44.8 24.6
1989 8 19 45.0 24.3
1989 8 20 44.8 26.0
1989 8 21 44.4 24.0
1989 8 22 45.2 25.0
我尝试了以下方法将我的完整数据集子集到超过 90% 温度的天数:
I have tried the following to subset my full dataset to just the days that exceed the 90th percentile temperature:
HW<- subset(Mydata, Mydata$MAX >= (quantile(Mydata$MAX,.9)) &
Mydata$MIN >= (quantile(Mydata$MIN,.9)))
但是,我陷入了如何只对满足条件的连续天数进行子集化的问题.
However, I got stuck in how I can subset only consecutive days that have met the condition.
推荐答案
data.table
的方法与@jlhoward 的方法略有不同(使用相同的数据):
An approach with data.table
which is slightly different from @jlhoward's approach (using the same data):
library(data.table)
setDT(df)
df[, hotday := +(MAX>=44.5 & MIN>=24.5)
][, hw.length := with(rle(hotday), rep(lengths,lengths))
][hotday == 0, hw.length := 0]
这会生成一个数据表,其中包含一个热波长度变量 (hw.length
) 而不是一个特定热度的 TRUE
/FALSE
变量波长:
this produces a datatable with a heat wave length variable (hw.length
) instead of a TRUE
/FALSE
variable for a specific heat wave length:
> df
YEAR MONTH DAY MAX MIN hotday hw.length
1: 1989 7 18 45.0 23.5 0 0
2: 1989 7 19 44.2 26.1 0 0
3: 1989 7 20 44.7 24.4 0 0
4: 1989 7 21 44.6 29.5 1 1
5: 1989 7 22 44.4 31.6 0 0
6: 1989 7 23 44.2 26.7 0 0
7: 1989 7 24 44.5 25.0 1 3
8: 1989 7 25 44.8 26.0 1 3
9: 1989 7 26 44.8 24.6 1 3
10: 1989 7 27 45.0 24.3 0 0
11: 1989 7 28 44.8 26.0 1 1
12: 1989 7 29 44.4 24.0 0 0
13: 1989 7 30 45.2 25.0 1 1
这篇关于如果满足条件,如何对连续行进行子集化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!