如果满足条件,如何对连续行进行子集化 [英] How to subset consecutive rows if they meet a condition

查看:27
本文介绍了如果满足条件,如何对连续行进行子集化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 R 来分析一些包含每日最高和最低温度值的时间序列 (1951-2013).数据具有以下结构:

I am using R to analyze a number of time series (1951-2013) containing daily values of Max and Min temperatures. The data has the following structure:

YEAR MONTH  DAY     MAX    MIN
1985     1    1    22.8    9.4
1985     1    2    28.6   11.7
1985     1    3    24.7   12.2
1985     1    4    17.2    8.0
1985     1    5    17.9    7.6
1985     1    6    17.7    8.1

我需要根据此定义找到热浪的频率:连续三天或更多天的时间段,每日最高和最低温度超过最高温度的第 90 个百分位数,以及研究中所有天的最低温度期间.

I need to find the frequency of heat waves based on this definition: A period of three or more consecutive days ‎with a daily maximum and minimum temperature exceeding the 90th percentile of the maximum ‎and minimum temperatures for all days in the studied period.

基本上,当最高和最低温度超过阈值时,我想对连续几天(三天或更多)进行子集化.输出将是这样的:

Basically, I want to subset those consecutive days (three or more) when the Max and Min temp exceed a threshold value. The output would be something like this:

YEAR MONTH   DAY     MAX     MIN
1989     7    18    45.0    23.5
1989     7    19    44.2    26.1
1989     7    20    44.7    24.4
1989     7    21    44.6    29.5
1989     7    24    44.4    31.6
1989     7    25    44.2    26.7
1989     7    26    44.5    25.0
1989     7    28    44.8    26.0
1989     7    29    44.8    24.6
1989     8    19    45.0    24.3
1989     8    20    44.8    26.0
1989     8    21    44.4    24.0
1989     8    22    45.2    25.0

我尝试了以下方法将我的完整数据集子集到超过 90% 温度的天数:

I have tried the following to subset my full dataset to just the days that exceed the 90th percentile temperature:

HW<- subset(Mydata, Mydata$MAX >= (quantile(Mydata$MAX,.9)) &
                    Mydata$MIN >= (quantile(Mydata$MIN,.9)))

但是,我陷入了如何只对满足条件的连续天数进行子集化的问题.

However, I got stuck in how I can subset only consecutive days that have met the condition.

推荐答案

data.table 的方法与@jlhoward 的方法略有不同(使用相同的数据):

An approach with data.table which is slightly different from @jlhoward's approach (using the same data):

library(data.table)

setDT(df)
df[, hotday := +(MAX>=44.5 & MIN>=24.5)
   ][, hw.length := with(rle(hotday), rep(lengths,lengths))
     ][hotday == 0, hw.length := 0]

这会生成一个数据表,其中包含一个热波长度变量 (hw.length) 而不是一个特定热度的 TRUE/FALSE 变量波长:

this produces a datatable with a heat wave length variable (hw.length) instead of a TRUE/FALSE variable for a specific heat wave length:

> df
    YEAR MONTH DAY  MAX  MIN hotday hw.length
 1: 1989     7  18 45.0 23.5      0         0
 2: 1989     7  19 44.2 26.1      0         0
 3: 1989     7  20 44.7 24.4      0         0
 4: 1989     7  21 44.6 29.5      1         1
 5: 1989     7  22 44.4 31.6      0         0
 6: 1989     7  23 44.2 26.7      0         0
 7: 1989     7  24 44.5 25.0      1         3
 8: 1989     7  25 44.8 26.0      1         3
 9: 1989     7  26 44.8 24.6      1         3
10: 1989     7  27 45.0 24.3      0         0
11: 1989     7  28 44.8 26.0      1         1
12: 1989     7  29 44.4 24.0      0         0
13: 1989     7  30 45.2 25.0      1         1

这篇关于如果满足条件,如何对连续行进行子集化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆