在Numpy/Scipy中“沿路径"中的快速线性插值被称为“沿路径". [英] Fast linear interpolation in Numpy / Scipy "along a path"

查看:102
本文介绍了在Numpy/Scipy中“沿路径"中的快速线性插值被称为“沿路径".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

比方说,我有一座山上3个(已知)高度的气象站的数据.具体来说,每个站点每分钟都会在其位置记录一次温度测量值.我想执行两种插值.而且我希望能够快速执行每项操作.

Let's say that I have data from weather stations at 3 (known) altitudes on a mountain. Specifically, each station records a temperature measurement at its location every minute. I have two kinds of interpolation I'd like to perform. And I'd like to be able to perform each quickly.

因此,让我们设置一些数据:

So let's set up some data:

import numpy as np
from scipy.interpolate import interp1d
import pandas as pd
import seaborn as sns

np.random.seed(0)
N, sigma = 1000., 5

basetemps = 70 + (np.random.randn(N) * sigma)
midtemps = 50 + (np.random.randn(N) * sigma)
toptemps = 40 + (np.random.randn(N) * sigma)
alltemps = np.array([basetemps, midtemps, toptemps]).T # note transpose!
trend = np.sin(4 / N * np.arange(N)) * 30
trend = trend[:, np.newaxis]

altitudes = np.array([500, 1500, 4000]).astype(float)

finaltemps = pd.DataFrame(alltemps + trend, columns=altitudes)
finaltemps.index.names, finaltemps.columns.names = ['Time'], ['Altitude']
finaltemps.plot()

太好了,所以我们的温度看起来像这样:

Great, so our temperatures look like this:

我认为这很简单.假设我想每次将温度提高到1000.我可以只使用内置的scipy插值方法:

I think this one is pretty straightforward. Say I want to get the temperature at an altitude of 1,000 for each time. I can just use built in scipy interpolation methods:

interping_function = interp1d(altitudes, finaltemps.values)
interped_to_1000 = interping_function(1000)

fig, ax = plt.subplots(1, 1, figsize=(8, 5))
finaltemps.plot(ax=ax, alpha=0.15)
ax.plot(interped_to_1000, label='Interped')
ax.legend(loc='best', title=finaltemps.columns.name)

这很好用.让我们来看看速度:

This works nicely. And let's see about speed:

%%timeit
res = interp1d(altitudes, finaltemps.values)(1000)
#-> 1000 loops, best of 3: 207 µs per loop

插值沿路径":

所以现在我有第二个相关问题.假设我知道远足聚会的海拔高度是时间的函数,并且我想通过对时间进行线性插值来计算其(移动)位置的温度. 尤其是,我知道远足聚会地点的时间与我知道气象站温度的时间相同时间.我也可以做到这一点很大的努力:

Interpolate "along a path":

So now I have a second, related problem. Say I know the altitude of a hiking party as a function of time, and I want to compute the temperature at their (moving) location by linearly interpolating my data through time. In particular, the times at which I know the location of the hiking party are the same times at which I know the temperatures at my weather stations. I can do this without too much effort:

location = np.linspace(altitudes[0], altitudes[-1], N)
interped_along_path = np.array([interp1d(altitudes, finaltemps.values[i, :])(loc) 
                                             for i, loc in enumerate(location)])

fig, ax = plt.subplots(1, 1, figsize=(8, 5))
finaltemps.plot(ax=ax, alpha=0.15)
ax.plot(interped_along_path, label='Interped')
ax.legend(loc='best', title=finaltemps.columns.name)

因此,这确实很好用,但需要注意的是,上面的关键行是使用列表推导来隐藏大量工作.在前一种情况下,scipy为我们创建了一个插值函数,并在大量数据上对其进行了一次评估.在这种情况下,scipy实际上是构造N个单独的插值函数,并对少量数据进行一次评估.这感觉本质上是低效的.这里有一个for循环潜伏(在列表理解中),而且,这感觉很松弛.

So this works really nicely, but its important to note that the key line above is using list comprehension to hide an enormous amount of work. In the previous case, scipy is creating a single interpolation function for us, and evaluating it once on a large amount of data. In this case, scipy is actually constructing N individual interpolating functions and evaluating each once on a small amount of data. This feels inherently inefficient. There is a for loop lurking here (in the list comprehension) and moreover, this just feels flabby.

毫不奇怪,这比以前的情况要慢得多:

Not surprisingly, this is much slower than the previous case:

%%timeit
res = np.array([interp1d(altitudes, finaltemps.values[i, :])(loc) 
                            for i, loc in enumerate(location)])
#-> 10 loops, best of 3: 145 ms per loop

因此,第二个示例的运行速度比第一个示例慢1,000. IE.这与繁重的步骤是制作线性插值函数"步骤的想法是一致的……在第二个示例中发生了1,000次,而在第一个示例中发生了仅一次.

So the second example runs 1,000 slower than the first. I.e. consistent with the idea that the heavy lifting is the "make a linear interpolation function" step...which is happening 1,000 times in the second example but only once in the first.

因此,问题是:是否有更好的方法来解决第二个问题?例如,是否存在一种通过2维插值进行设置的好方法(也许可以处理这种情况)在哪里知道远足聚会的时间不是在(em>不是采样温度的时间)?还是有一种特别灵活的方式来处理时间排队的事情?还是其他?

So, the question: is there a better way to approach the second problem? For example, is there a good way to set it up with 2-dimensinoal interpolation (which could perhaps handle the case where the times at which the hiking party locations are known are not the times at which the temperatures have been sampled)? Or is there a particularly slick way to handle things here where the times do line up? Or other?

推荐答案

对于固定时间点,可以使用以下插值函数:

For a fixed point in time, you can utilize the following interpolation function:

g(a) = cc[0]*abs(a-aa[0]) + cc[1]*abs(a-aa[1]) + cc[2]*abs(a-aa[2])

其中,a是远足者的海拔高度,aa具有3个测量值的向量altitudescc是具有系数的向量.有三件事要注意:

where a is the hiker's altitude, aa the vector with the 3 measurement altitudes and cc is a vector with the coefficients. There are three things to note:

  1. 对于与aa相对应的给定温度(alltemps),可以通过使用np.linalg.solve()求解线性矩阵方程来确定cc.
  2. g(a)易于对(N,)维a和(N,3)维cc(分别包括np.linalg.solve())进行矢量化.
  3. g(a)被称为一阶单变量样条内核(三个点).使用abs(a-aa[i])**(2*d-1)会将样条顺序更改为d.这种方法可以解释为机器学习中的高斯过程的简化版本.
  1. For given temperatures (alltemps) corresponding to aa, determining cc can be done by solving a linear matrix equation using np.linalg.solve().
  2. g(a) is easy to vectorize for a (N,) dimensional a and (N, 3) dimensional cc (including np.linalg.solve() respectively).
  3. g(a) is called a first order univariate spline kernel (for three points). Using abs(a-aa[i])**(2*d-1) would change the spline order to d. This approach could be interpreted a simplified version of a Gaussian Process in Machine Learning.

所以代码应该是:

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# generate temperatures
np.random.seed(0)
N, sigma = 1000, 5
trend = np.sin(4 / N * np.arange(N)) * 30
alltemps = np.array([tmp0 + trend + sigma*np.random.randn(N)
                     for tmp0 in [70, 50, 40]])

# generate attitudes:
altitudes = np.array([500, 1500, 4000]).astype(float)
location = np.linspace(altitudes[0], altitudes[-1], N)


def doit():
    """ do the interpolation, improved version for speed """
    AA = np.vstack([np.abs(altitudes-a_i) for a_i in altitudes])
    # This is slighty faster than np.linalg.solve(), because AA is small:
    cc = np.dot(np.linalg.inv(AA), alltemps)

    return (cc[0]*np.abs(location-altitudes[0]) +
            cc[1]*np.abs(location-altitudes[1]) +
            cc[2]*np.abs(location-altitudes[2]))


t_loc = doit()  # call interpolator

# do the plotting:
fg, ax = plt.subplots(num=1)
for alt, t in zip(altitudes, alltemps):
    ax.plot(t, label="%d feet" % alt, alpha=.5)
ax.plot(t_loc, label="Interpolation")
ax.legend(loc="best", title="Altitude:")
ax.set_xlabel("Time")
ax.set_ylabel("Temperature")
fg.canvas.draw()

测量时间可以得出:

In [2]: %timeit doit()
10000 loops, best of 3: 107 µs per loop

更新:我替换了doit()中的原始列表理解 导入速度提高了30%(对于N=1000).

Update: I replaced the original list comprehensions in doit() to import speed by 30% (For N=1000).

此外,按照比较的要求,我机器上的@moarningsun基准代码块:

Furthermore, as requested for comparison, @moarningsun's benchmark code block on my machine:

10 loops, best of 3: 110 ms per loop  
interp_checked
10000 loops, best of 3: 83.9 µs per loop
scipy_interpn
1000 loops, best of 3: 678 µs per loop
Output allclose:
[True, True, True]

请注意,N=1000是相对较小的数字.使用N=100000会产生结果:

Note that N=1000 is a relatively small number. Using N=100000 produces the results:

interp_checked
100 loops, best of 3: 8.37 ms per loop

%timeit doit()
100 loops, best of 3: 5.31 ms per loop

这表明,对于大型N而言,此方法的伸缩性要好于interp_checked方法.

This shows that this approach scales better for large N than the interp_checked approach.

这篇关于在Numpy/Scipy中“沿路径"中的快速线性插值被称为“沿路径".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆