for循环在不规则的时间序列 [英] for loop in irregular time series
问题描述
我正在寻找关于如何循环遍历以下内容的建议,这是一个更大的数据集的一个子集。我希望以下代表作品。
I'm looking for advice on how to loop through the following, which is a subset of a much larger data set. I hope the following representation works.
mydf <- structure(list(site_id = c("39ADA00070", "39ADA00070", "39ADA00070",
"39ADA00070", "39ADA00070", "39ADA00070", "39ADA00070", "39ADA00070",
"39ADA00070", "39ADA00070", "39ADA00070", "39ADA00070", "39ADA00070",
"39ADA00070", "39ADA00070", "39ADA00070", "39ADA00070", "39ADA00070",
"39ADA00070", "39ADA00070", "39ADA00070", "39ADA00070", "39ADA00070",
"39ADA00070", "39ADA00070", "39ADA00070", "39ADA00070", "39ADA00070",
"39ADA00070", "39ADA00070", "39ALL00184", "39ALL00184", "39ALL00184",
"39ALL00184", "39ALL00184", "39ALL00184", "39ALL00184", "39ALL00184",
"39ALL00184", "39ALL00184", "39ALL00184", "39ALL00184", "39ALL00184",
"39ALL00184", "39ALL00184", "39ALL00184", "39ALL00184", "39ALL00184",
"39ALL00184", "39ALL00184", "39ALL00184", "39ALL00184", "39ALL00184"
), date = structure(c(6339, 8594, 9293, 9441, 10014, 10604, 11080,
11821, 12717, 12907, 13081, 13277, 13459, 13635, 13822, 14012,
14207, 14207, 14355, 14564, 14704, 14917, 15105, 15271, 15478,
15644, 15833, 15834, 16009, 16203, 7783, 8406, 8554, 8686, 9034,
9260, 9632, 9777, 10002, 10491, 10491, 11060, 11585, 12145, 12145,
12696, 13242, 13242, 13775, 14363, 14881, 15428, 15974), class = "Date"),
var1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 159L, 148L,
149L, 134L, 179L, 205L, 193L, 109L, 109L, 177L, 75L, 272L,
150L, 115L, 232L, 230L, 183L, 159L, 159L, 304L, 220L, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
-98L, -98L, -38L, -74L, -74L, -80L, -48L), var2 = c(NA, NA,
NA, NA, NA, NA, NA, NA, 16.8, 16.8, 14.5, 14.2, 15.1, 14.5,
15, 15.2, 13.2, 13.2, 15, 15.2, 15.1, 14.4, 14.8, 15.2, 16.3,
NA, 14.3, 14.3, 15.6, 14.8, NA, 12, 14.7, NA, 14.6, NA, 13.7,
12.3, 12.5, 13.5, 13.5, 12.5, 13.1, 14.2, 14.2, 14.1, 12.5,
12.5, 13.5, 12.7, 12.6, 12.5, 12.6), var3 = c(NA, NA, NA,
NA, NA, NA, NA, NA, 7.35, 7.85, 7.5, 7.47, 7.62, 7.08, 7.08,
7.2, 7.4, 7.4, 7.26, 7.05, 6.56, 7.2, 7.42, 6.5, 7.81, 8.43,
7.57, 7.57, 7.42, 7.72, NA, 6.58, 6.8, NA, 7.75, NA, 7.06,
6.77, 6.41, 6.84, 6.84, 7.85, 7.13, 7.26, 7.26, 7.06, 7.14,
7.14, 7.11, 6.9, 7.11, 7.2, 7.1), var4 = c(NA, 283L, 216L,
223L, 256L, 165L, 192L, 216L, 173L, 216L, 179L, 282L, 146L,
227L, 141L, 210L, 160L, 162L, 157L, 140L, 235L, 166L, 216L,
NA, 162L, 193L, 286L, 274L, 163L, 209L, NA, 304L, 321L, 293L,
398L, 302L, 301L, 282L, 288L, 292L, 292L, 302L, 515L, 309L,
309L, 323L, 338L, 295L, 280L, 279L, 325L, 328L, 322L), var5 = c(NA,
NA, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), var6 = c(NA, NA,
29L, 32L, 36L, 24L, 25L, 29L, 27L, 27L, 24L, 32L, 21L, 27L,
21L, 26L, 23L, 24L, 25L, 20L, 24L, 22L, 28L, 24L, 20L, 23L,
30L, 29L, 21L, 24L, 15L, 15L, 18L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 15L), var7 = c(NA, NA, 77, 83, 87, 66, 73, 73,
65, 76, 69, 93, 60, 76, 56, 77, 67, 68, 68, 60, 67, 63, 82,
69, 56, 68, 85, 83, 59, 68.2, 157, 159, 164, 169, 155, 176,
156, 156, 162, 162, 162, 160, 180, 163, 163, 158, 168, 171,
162, 167, 177, 167, 168), var8 = c(NA, NA, 25, 26, 29, 21,
22, 23, 20, 23, 21, 30, 17, 24, 16, 23, 20, 20, 21, 17, 23,
18, 25, 20, 17, 21, 27, 27, 17, 20.9, 91, 89, 96, 92, 86,
100, 89, 91, 92, 94, 94, 91, 97, 91, 91, 92, 98, 99, 94,
100, 106, 98, 100), var9 = c(1.02, 1, 0.37, 0.48, 0.88, 0.16,
0.17, 0.24, 0.25, 5.98, 0.26, 0.54, 0, 0.19, 0, 0.18, 0.14,
0.13, 0.16, 0.11, 0.19, 0.16, 0.26, NA, 0.11, 0.27, 0.19,
0.19, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, NA, 0.1, 0.1, 0.1,
0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0, 0, 0.1, 0.1,
0.1), var10 = c(50, 48, 64, 55, 52, 64, 69, 63.3, 56.1, 40.6,
58.6, 43.9, 62.2, 51.9, 55.6, 53.4, 61.3, 61, 61.1, 61.9,
51.5, 60.7, 52.2, NA, 66, 52.8, 46.8, 47.5, 59.2, 53.4, NA,
560, 650, 540, 548, 655, 565, 531, 540, 501, 501, 531, 535,
547, 547, 492, 537, 542, 512, 542, 548, 581, 540)), class = "data.frame", row.names = c(NA,
-53L), .Names = c("site_id", "date", "var1", "var2", "var3",
"var4", "var5", "var6", "var7", "var8", "var9", "var10"))
这个 data.frame
是一组不规则的时间序列,使用 site_id
作为主要ID因子, date
作为日期,然后10个变量。实际的 data.frame
有数百个ID和几十个因素。
This data.frame
is a set of irregular time series using site_id
as the main ID factor, date
as the date, and then 10 variables. The actual data.frame
has hundreds of IDs and dozens of factors.
我知道我可以使用 site_id
访问每个时间序列,例如
I know I can access each time series by site_id
using, for example
mydf[mydf$site_id == '39ADA00070', ][,3]
获取第一个 site_id
的 var1
。
我正在寻找的是一个稳健的,用于
循环运行数据。框架
:
What I am looking for is a robust for
loop to run through the data.frame
:
for (i in 1:length(site_id)){
perform something on
var1 through var10
output matrix of that something
}
某些东西将是任意数量的测试或图,例如
That something would be any number of tests or plots, e.g.
GetOutliers()
(来自 extremevalues
包)
各种图,从 ggplot2
等,等等。
GetOutliers()
(from the extremevalues
package)
various plots, from ggplot2
etc, etc.
但是,首先,我只需要帮助获取循环索引的的赋值。
But first, I just need help getting the assignment of the for
loop indices correct.
我不反对使用申请
( ddply
)这项工作的工具,但是我想从循环的基本开始。然后我可以通过解决
NA
s,审查值等来清理。
I am not against using apply
(ddply
) tools for this work, but I wanted to start with a basic for
loop. Then I can clean up by addressing NA
s, censored values, etc.
非常感谢! >
Thanks so much!
推荐答案
尝试以下操作:
for(ss in unique(mydf$site_id)){
for(cc in 3:12){
# do whatever function
print(max(mydf[mydf$site_id == ss, cc],na.rm=TRUE))
}
}
[1] 304
[1] 16.8
[1] 8.43
[1] 286
[1] 2
[1] 36
[1] 93
[1] 30
[1] 5.98
[1] 69
[1] -38
[1] 14.7
[1] 7.85
[1] 515
[1] 2
[1] 18
[1] 180
[1] 106
[1] 0.1
[1] 655
这篇关于for循环在不规则的时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!