使用numpy或cython进行高效的成对DTW计算 [英] Efficient pairwise DTW calculation using numpy or cython

查看:59
本文介绍了使用numpy或cython进行高效的成对DTW计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算numpy数组中包含的多个时间序列之间的成对距离.请参见下面的代码

I am trying to calculate the pairwise distances between multiple time-series contained in a numpy array. Please see the code below

print(type(sales))
print(sales.shape)

<class 'numpy.ndarray'>
(687, 157)

因此, sales 包含687个长度为157的时间序列.使用pdist计算时间序列之间的DTW距离.

So, sales contains 687 time series of length 157. Using pdist to calculate the DTW distances between the time series.

import fastdtw
import scipy.spatial.distance as sd

def my_fastdtw(sales1, sales2):
    return fastdtw.fastdtw(sales1,sales2)[0]

distance_matrix = sd.pdist(sales, my_fastdtw)

---尝试不使用 pdist() -----

--- tried doing it without pdist()-----

distance_matrix = []
m = len(sales)    
for i in range(0, m - 1):
    for j in range(i + 1, m):
        distance_matrix.append(fastdtw.fastdtw(sales[i], sales[j]))

---并行化内部for循环-----

from joblib import Parallel, delayed
import multiprocessing
import fastdtw

num_cores = multiprocessing.cpu_count() - 1
N = 687

def my_fastdtw(sales1, sales2):
    return fastdtw.fastdtw(sales1,sales2)[0]

results = [[] for i in range(N)]
for i in range(0, N- 1):
    results[i] = Parallel(n_jobs=num_cores)(delayed(my_fastdtw) (sales[i],sales[j])  for j in range(i + 1, N) )

所有方法都非常慢.并行方法大约需要12分钟.有人可以建议一种有效的方法吗?

All the methods are very slow. The parallel method takes around 12 minutes. Can someone please suggest an efficient way?

---按照下面的答案中提到的步骤---

--- Following the steps mentioned in the answer below---

以下是lib文件夹的外观:

Here is how the lib folder looks like:

VirtualBox:~/anaconda3/lib/python3.6/site-packages/fastdtw-0.3.2-py3.6- linux-x86_64.egg/fastdtw$ ls
_fastdtw.cpython-36m-x86_64-linux-gnu.so  fastdtw.py   __pycache__
_fastdtw.py                               __init__.py

因此,其中有一个cydon版本的fastdtw.在安装时,我没有收到任何错误.即使是现在,当我在程序执行过程中按 CTRL-C 时,也可​​以看到正在使用纯python版本( fastdtw.py ):

So, there is a cython version of fastdtw in there. While installation, I did not receive any errors. Even now, when I pressed CTRL-C during my program execution, I can see that the pure python version is being used (fastdtw.py):

/home/vishal/anaconda3/lib/python3.6/site-packages/fastdtw/fastdtw.py in fastdtw(x, y, radius, dist)

/home/vishal/anaconda3/lib/python3.6/site-packages/fastdtw/fastdtw.py in __fastdtw(x, y, radius, dist)

代码仍然像以前一样慢.

The code remains slow like before.

推荐答案

TL; DR

您的 fastdtw 陷入了安装快速cpp-version的困境,并悄悄地退回到了纯Python版本,这很慢.

Your fastdtw falled to install the fast cpp-version and falls back silently to a pure-python version, which is slow.

您需要修复 fastdtw -package的安装.

You need to fix the installation of the fastdtw-package.

整个计算是在 fastdtw 中完成的,因此您无法真正从外部加速计算.而且,并行化和python并不是一件容易的事(还好吗?).

The whole calculation is done in fastdtw, so you cannot really speed it up from the outside. And parallelization and python is not such an easy thing (yet?).

fastdtw 文档说它需要大约 O(n)个操作进行比较,因此对于您的整个测试集,大约需要个数量级> 10 ^ 9 操作,如果使用C语言进行编程,则应在几秒钟内完成.您所看到的性能远不及它.

The fastdtw documentation says it needs about O(n) operations for a comparison, so for your whole test-set it will need about order of magnitude of 10^9 operations, which should be finished in about some seconds, if programmed in, for example, C. The performance you see is nowhere near it.

如果我们看看 fastdtw 的代码,就会发现有两个版本:cython/cpp-version,它是快速的并通过cython导入,而慢速回退是纯python-version.如果未预先设置快速版本,则将使用慢速python版本.

If we look at the code of fastdtw we see, that there are two versions: the cython/cpp-version which is fast and imported via cython and a slow fall back pure-python-version. If the fast version isn't preset, the slow python version is silently used.

因此运行您的计算,并用 Ctr + C 中断它,您将看到自己在python代码中.您还可以转到您的lib文件夹,看看里面只有纯python版本.

So run your calculation, interrupt it with Ctr+C and you will see, that you are somewhere in python-code. You can also go to your lib-folder and see, that there is only the pure-python version inside.

因此,您安装快速 fastdtw 版本失败.实际上,我认为wheel-package已被破坏,至少在我的版本中,仅存在纯python代码.

So your installation of the fast fastdtw version failed. Actually, I think the wheel-package is botched, at least for my version there is only the pure python code present.

该怎么办?

  1. 获取源代码,例如通过 git clone https://github.com/slaypni/fastdtw
  2. 进入 fstdtw 文件夹并运行 python setup.py build
  3. 提防错误.我的是

严重错误:numpy/npy_math.h:没有这样的文件或目录

fatal error: numpy/npy_math.h: No such file or directory

  1. 修复它.

对我来说,解决方法是更改​​ setup.py 中的以下行:

For me, the fix was to change the following lines in setup.py:

import numpy # THIS ADDED
extensions = [Extension(
        'fastdtw._fastdtw',
        [os.path.join('fastdtw', '_fastdtw' + ext)],
        language="c++",
        include_dirs=[numpy.get_include()], # AND ADDED numpy.get_include()
        libraries=["stdc++"]
    )]

  1. 重复3. + 4.直到成功
  2. 运行 python setup.py install

现在,您的程序应该快100倍左右.

Now your program should be about 100 times faster. `

这篇关于使用numpy或cython进行高效的成对DTW计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆