ddply(或类似的东西)可以滑动窗口吗? [英] Can `ddply` (or similar) do a sliding window?

查看:87
本文介绍了ddply(或类似的东西)可以滑动窗口吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

类似

sliding = function(df, n, f)
    ldply(1:(nrow(df) - n + 1), function(k)
        f(df[k:(k + n - 1), ])
    )

那样使用

> df
  n         a
1 1 0.8021891
2 2 0.9446330
...

> sliding(df, 2, function(df) with(df,
+     data.frame(n = n[1], a = a[1], b = sum(n - a))
+ ))
  n         a        b
1 1 0.8021891 1.253178
...

除了直接在ddply内部,这样我可以得到漂亮的语法糖 附带了吗?

Except straight inside ddply, so that I could get the nice syntactic sugar that comes with it?

推荐答案

由于尚未发布此问题的答案,我认为我会提出一个理由以提出实际上还有更好的方法解决这类问题-可能还会快几千倍. (如果这没有帮助,请告诉我,但我认为这总比没有好.)

Since there hasn't been an answer posted to this question, I thought I'd put one up to make the case that there is actually an even better way to go about this type of problem - one that can also be potentially thousands of times faster. (If this isn't helpful, please let me know, but I thought it would be better than nothing here)

每当我听到移动平均值"或滑动窗口"时, FFT卷积就会立即浮现在我的脑海.这是因为它可以极其有效的方式处理这些类型的问题.由于所有滑动"操作都是在幕后完成的,所以我认为它也具有您所要求的所有语法美.

Whenever I hear "moving average" or "sliding window", FFT convolution immediately pops into my mind. This is because it can handle these types of problems in an extremely efficient manner. Since all the "sliding" is done behind the scenes, I think that it also has all the syntactic beauty you could ever ask for.

(以下代码位于一个文件中,位于 https://gist.github.com/1320175)

(The following code is available in one file at https://gist.github.com/1320175)

我们首先模拟一些数据(为简单起见,我在这里使用整数,但是当然您不需要这样做).

We start by simulating some data (I'm using integers here for simplicity, but of course you don't need to).

require(plyr)
set.seed(12345)

n = 10
n.sum = 2
a = sample.int(10, n, replace=T)

df = data.frame(n=1:n, a)

> df
    n  a
1   1  8
2   2  9
3   3  8
4   4  9
5   5  5
6   6  2
7   7  4
8   8  6
9   9  8
10 10 10

现在,我们将一次性一次性计算n-a.

Now, we will precompute n-a all in one go.

n.minus.a = with(df, n - a)

接下来,定义一个内核 k,当与我们的输入n.minus.a进行卷积时,将对我们的数据进行求和(或平均/平滑/其他操作).

Next, define a kernel k that, when convolved with our input n.minus.a, will do the summing (or averaging/smoothing/whatever else) to our data.

k = rep(0, n)
k[1:n.sum] = 1

在完成所有设置后,我们可以定义一个函数,以通过fft()在频域中有效地进行卷积.

With everything set up, we can define a function to do this convolution efficiently in the frequency domain via fft().

myConv <- function(x, k){
  Fx  = fft(x)
  Fk  = fft(k)
  Fxk = Fx * Fk
  xk  = fft(Fxk, inverse=T)
  (Re(xk) / n)[-(1:(n.sum-1))]
}

执行此操作的语法非常简单:

The syntax to execute this is nice and simple:

> myConv(n.minus.a, k)
[1] -14 -12 -10  -5   4   7   5   3   1

当您在R中使用convolve()便捷功能时,所有这一切也会在后台发生.

All this also happens under the hood when you use the convolve() convenience function in R.

> convolve(n.minus.a, k)[1:(length(n.minus.a)-n.sum+1)]
[1] -14 -12 -10  -5   4   7   5   3   1

我们现在将其与手动方法进行比较,以显示结果都相同:

We now compare this to the manual method to show that the results are all equivalent:

> sliding(df, 2, function(df) with(df, data.frame(n = n[1], a = a[1], b = sum(n - a))))
  n a   b
1 1 8 -14
2 2 9 -12
3 3 8 -10
4 4 9  -5
5 5 5   4
6 6 2   7
7 7 4   5
8 8 6   3
9 9 8   1

最后,我们将制作n=10^4并测试所有这些方法的速度:

Finally, we will make n=10^4 and test all these methods for speed:

> system.time(myConv(n.minus.a, k))
   user  system elapsed 
  0.002   0.000   0.002 
> system.time(convolve(n.minus.a, k, type='circ')[1:(length(n.minus.a)-n.sum+1)])
   user  system elapsed 
  0.002   0.000   0.002 
> system.time(sliding(df, 2, function(df) with(df, data.frame(n = n[1], a = a[1], b = sum(n - a)))))
   user  system elapsed 
  7.944   0.018   7.962 

FFT方法几乎立即返回,即使在这种粗糙的时序下,也比手动方法快4000倍.

The FFT methods return almost instantaneously, and, even with this rough timing, are almost 4000 times faster than the manual method.

当然,并不是所有的滑动问题都可以归入此范式,但是对于使用sum()的数字​​问题(以及均值,加权平均值等),它可以完美地起作用.无论如何,至少值得谷歌一点儿,看看是否有可用的过滤器内核可以解决给定探针的问题,这通常是值得的.祝你好运!

Of course not every sort of sliding problem can be pigeon-holed into this paradigm, but for numerical problems like this one using sum() (and also means, weighted averages, etc.) it works perfectly. At any rate, it is usually well worth it to at least google a bit to see if there is a filter kernel available that will do the trick for a given probelm. Good luck!

这篇关于ddply(或类似的东西)可以滑动窗口吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆