在数据框的子集上插值变量 [英] Interpolate variables on subsets of dataframe
问题描述
我有一个很大的数据框,其中包含来自多个州的调查长达数年的观察结果.这是数据结构:
I have a large dataframe which has observations from surveys from multiple states for several years. Here's the data structure:
state | survey.year | time1 | obs1 | time2 | obs2
CA | 2000 | 1 | 23 | 1.2 | 43
CA | 2001 | 2 | 43 | 1.4 | 52
CA | 2002 | 5 | 53 | 3.2 | 61
...
CA | 1998 | 3 | 12 | 2.3 | 20
CA | 1999 | 4 | 14 | 2.8 | 25
CA | 2003 | 5 | 19 | 4.3 | 29
...
ND | 2000 | 2 | 223 | 3.2 | 239
ND | 2001 | 4 | 233 | 4.2 | 321
ND | 2003 | 7 | 256 | 7.9 | 387
对于每个州/调查年组合,我想对obs2进行插值,以使其时间位置与(time1,obs1)对齐.
For each state/survey.year combination, I would like to interpolate obs2 so that it's time-location is lined up with (time1,obs1).
ie,我想将数据框分解为state/survey.year块,执行线性插值,然后将各个state/survey.year数据框缝合在一起,成为一个主数据框.
ie I would like to break up the dataframe into state/survey.year chunks, perform my linear interpolation, and then stitch the individual state/survey.year dataframes back together into a master dataframe.
我一直在尝试找出如何使用plyr和Hmisc软件包.但是要纠缠不清.
I have been trying to figure out how to use the plyr and Hmisc packages for this. But keeping getting myself in a tangle.
这是我编写的用于进行插值的代码:
Here's the code that I wrote to do the interpolation:
require(Hmisc)
df <- new.obs2 <- NULL
for (i in 1:(0.5*(ncol(indirect)-1))){
df[,"new.obs2"] <- approxExtrap(df[,"time1"],
df[,"obs1"],
xout = df[,"obs2"],
method="linear",
rule=2)
}
但是我不确定如何在这个问题上发挥作用.您的慷慨建议将不胜感激.本质上,我只是想在每个state/survey.year组合中内插"obs2",因此时间参考与"obs1"一致.
But I am not sure how to unleash plyr on this problem. Your generous advice and suggestions would be much appreciated. Essentially - I am just trying to interpolate "obs2", within each state/survey.year combination, so it's time references line up with those of "obs1".
当然,如果有一种巧妙的方法可以在不调用plyr函数的情况下执行此操作,那么我将对此持开放态度...
Of course if there's a slick way to do this without invoking plyr functions, then I'd be open to that...
谢谢!
推荐答案
这应该很简单,
ddply(df,.(state,survey.year),transform,
new.obs2 = approxExtrap(time1,obs1,xout = obs2,
method = "linear",
rule = 2))
但是我不能向您保证任何事情,因为我还没有一个最清楚的主意,您的for
循环的意义是什么. (每次循环都会覆盖df[,"new.obs2"]
?您将整个数据帧df
初始化为NULL
?什么是indirect
?")
But I can't promise you anything, since I haven't the foggiest idea what the point of your for
loop is. (It's overwriting df[,"new.obs2"]
each time through the loop? You initialize the entire data frame df
to NULL
? What's indirect
?)
这篇关于在数据框的子集上插值变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!