在Python中同时插值行 [英] Interpolate rows simultaneously in Python

查看:69
本文介绍了在Python中同时插值行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对代码进行矢量化处理,并且遇到了障碍.我有:

  • x个值的nxd数组 [[x1],[...],[xn]] (其中每行 [x1] 有很多点[x11,...,x1d]
  • y个值的nxd数组 [[y1],[y2],[y3]] (其中每行 [y1] 有很多点 [y11,...,y1d]
  • x'个值的nx1数组 [[x'1],[...],[x'n]] ,我想根据对应的行插入y值x和y

我唯一想使用的是列表理解,例如 [np.interp(x'[i ,:],x [i ,:],y [i ,:]))(n)] .我想要一个更快的矢量化选项(如果存在).感谢您的帮助!

解决方案

这几乎不是答案,但我想它可能仍然对某人有用(如果没有,请随时删除);顺便说一下,我想我一开始误解了你的问题.您拥有的是要内插的 n 个不同的一维数据集或函数y(x)的集合(否则请纠正我).

因此,事实证明,通过多维插值法做到这一点是一种糟糕的方法.我的想法是为数据添加一个新维度,以便将您的数据集映射到一个单独的数据集中,其中新维度是区分不同 xi 的地方,其中i = 1,2..., n .换句话说,您将在这个新维度中为 x 的每一行分配一个值,即 z ;这样,将不同的函数正确地映射到此更高维的空间.
但是,这种方法比 np.interp 列表理解解决方案要慢,这在我的计算机中至少是一个数量级.我猜想这与二维插值算法的关系最好是O(nlog(n))阶(这是一个猜测).从这个意义上讲,对不同的数据集执行多个插值比一个大插值似乎更有效率.

无论如何,下面的代码片段显示了该方法:

 将numpy导入为np从scipy.interpolate导入LinearNDInterpolatordef vectorized_interpolation(x,y,xq):"使用LinearNDInterpolator的矢量化选项"#新增维度中的虚拟新数据点z = np.arange(x.shape [0])#我们必须为x的每一行重复每个z值插值= LinearNDInterpolator(list(zip(x(r.ravel(),np.repeat(z,x.shape [1])))),y.ravel())返回插值(xq,z)def non_vectorized_interpolation(x,y,xq):"您的非矢量化解决方案"返回np.array([np.interp(xq [i],x [i],y [i])对于范围(x.shape [0])]中的i如果__name__ =="__main__":n,d = 100,500x = np.linspace(0,2 * np.pi,n * d).reshape((n,d))y = np.sin(x)xq = np.linspace(0,2 * np.pi,n)yq1 = vectorized_interpolation(x,y,xq)yq2 = non_vectorized_interpolation(x,y,xq) 

向量化解决方案的唯一优点是 LinearNDInterpolator (以及其他一些 scipy.interpolate 函数)显式计算插值,因此如果您要重用插值计划多次对同一数据集进行插值,并避免重复计算.您可以尝试的另一件事是,如果您的计算机中有多个内核,则使用多处理,但这不是向量化,这正是您所要求的.抱歉,我没有更多帮助.

I am trying to vectorize my code and have reached a roadblock. I have :

  • nxd array of x values [[x1],[...],[xn]] (where each row [x1] has many points [x11, ..., x1d]
  • nxd array of y values [[y1],[y2],[y3]] (where each row [y1] has many points [y11, ..., y1d]
  • nx1 array of x' values [[x'1],[...],[x'n]] that I would like to interpolate a y value for based on the corresponding row of x and y

The only thing I can think to use is a list comprehension like [np.interp(x'[i,:], x[i,:], y[i,:]) for i in range(n)]. I'd like a faster vectorized option if one exists. Thanks for the help!

解决方案

This is hardly an answer, but I guess it may still be useful for someone (if not, feel free to delete this); and by the way, I think I misunderstood your question at first. What you have is a collection of n different one-dimensional datasets or functions y(x) that you want to interpolate (correct me otherwise).

As such, it turns out doing this by multidimensional interpolation is a terrible approach. The idea I thought is to add a new dimension to the data so your datasets are mapped into one single dataset in which this new dimension is what distinguishes between the different xi, where i=1,2,...,n. In other words, you assign a value in this new dimension, let's say, z, to every row of x; this way, different functions are correctly mapped to this higher-dimensional space.
However, this approach is slower than the np.interp list comprehension solution, at least one order of magnitude in my computer. I guess it has to do with two-dimensional interpolation algorithms being at best of order O(nlog(n)) (this is a guess); in this sense, it would seem more efficient to perform multiple interpolations to different datasets rather than one big interpolation.

Anyways, the approach is shown in the following snippet:

import numpy as np
from scipy.interpolate import LinearNDInterpolator

def vectorized_interpolation(x, y, xq):
    """
    Vectorized option using LinearNDInterpolator
    """
    # Dummy new data points in added dimension
    z = np.arange(x.shape[0])
    # We must repeat every z value for every row of x
    interpolant = LinearNDInterpolator(list(zip(x.ravel(), np.repeat(z, x.shape[1]))), y.ravel())

    return interpolant(xq, z)

def non_vectorized_interpolation(x, y, xq):
    """
    Your non-vectorized solution
    """
    return np.array([np.interp(xq[i], x[i], y[i]) for i in range(x.shape[0])])

if __name__ == "__main__":
    n, d = 100, 500
    x = np.linspace(0, 2*np.pi, n*d).reshape((n, d))
    y = np.sin(x)
    xq = np.linspace(0, 2*np.pi, n)
    
    yq1 = vectorized_interpolation(x, y, xq)
    yq2 = non_vectorized_interpolation(x, y, xq)

The only advantage of the vectorized solution is that LinearNDInterpolator (and some of the other scipy.interpolate functions) explicitly calculates the interpolant, so you can reuse it if you plan on interpolating the same datasets several times and avoid repetitive calculations. Another thing you could try is using multiprocessing if you have several cores in your machine, but this is not vectorizing which is what you asked for. Sorry I can't be of more help.

这篇关于在Python中同时插值行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆