不同的库具有不同的结果和性能 [英] Different results and performances with different libraries

查看:197
本文介绍了不同的库具有不同的结果和性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在比较库 dtaidistance cdtw 用于DTW计算.这是我的代码:

I'm comparing the libraries dtaidistance, fastdtw and cdtw for DTW computations. This is my code:

from fastdtw import fastdtw
from cdtw import pydtw
import fastdtw
import array
from timeit import default_timer as timer
from dtaidistance import dtw, dtw_visualisation as dtwvis

s1 = mySampleSequences[0] # first sample sequence consisting of 3000 samples
s2 = mySampleSequences[1] # second sample sequence consisting of 3000 samples

start = timer()
distance1 = dtw.distance(s1, s2)
end = timer()
start2 = timer()
distance2 = dtw.distance_fast(array.array('d',s1),array.array('d',s2))
end2 = timer()
start3 = timer()
distance3, path3 = fastdtw(s1,s2)
end3 = timer()
start4 = timer()
distance4 = pydtw.dtw(s1,s2).get_dist()
end4 = timer()

print("dtw.distance(x,y) time: "+ str(end - start))
print("dtw.distance(x,y) distance: "+str(distance1))
print("dtw.distance_fast(x,y) time: "+ str(end2 - start2))
print("dtw.distance_fast(x,y) distance: " + str(distance2))
print("fastdtw(x,y) time: "+ str(end3 - start3))
print("fastdtw(x,y) distance: " + str(distance3))
print("pydtw.dtw(x,y) time: "+ str(end4 - start4))
print("pydtw.dtw(x,y) distance: " + str(distance4))

这是我得到的输出:

  • dtw.distance(x,y)时间:22.16925272245262
  • dtw.distance(x,y)距离:1888.8583853746156
  • dtw.distance_fast(x,y)时间:0.3889036471839056
  • dtw.distance_fast(x,y)距离:1888.8583853746156
  • fastdtw(x,y)时间:0.23296659641047412
  • fastdtw(x,y)距离:27238.0
  • pydtw.dtw(x,y)时间:0.13706478039556558
  • pydtw.dtw(x,y)距离:17330.0

我的问题是:为什么我会有不同的表现和不同的距离?非常感谢您的评论.

My question is: Why do I get different performances and different distances? Thank you very much for your comments.

//时间测量单位为秒.

// edit: The unit of the time measurements is seconds.

推荐答案

在Felipe Mello的翔实回答之上还有一些其他信息(免责声明:DTAIDistance的作者).

Some additional information on top of Felipe Mello's informative answer (disclaimer: author of DTAIDistance here).

对于距离结果:

  • DTAIDistance仅使用欧几里得距离(或L2范数),这是硬编码的.选择该选项是为了加快C代码的执行速度(不调用函数). 快速"是指使用基于C的实现而不是纯Python版本,因此这两种方法给出的结果完全相同.
  • FastDTW是与DTW不同的算法.这是线性近似. 快速"是指较低的复杂性.
  • cDTW.我对这个工具箱不是很熟悉,但是它似乎实现了L1规范.

对于速度结果:

通常,基于纯C的算法比纯Python的算法快100倍左右(在DTAIDistance中,这是distance()和distance_fast()之间的差异).对于基于C的方法,差异主要是由于方法的灵活性.例如,传递自定义规范会减慢该方法的速度(更多函数调用).同样,不同的方法具有不同的选项,这会在算法中导致或多或少的switch语句.例如,DTAIDistance提供了许多方法来调整该方法,因为它更喜欢提早停止计算,而不是进一步的优化(也由Felipe Mello观察到).此外,不同的方法存储不同数量的数据. DTAIDistance距离方法不存储整个矩阵,也不提供线性空间复杂度(使用具有二次空间复杂度的warping_paths方法获得完整的矩阵).通常,对于DTW,建议使用窗口来降低时间复杂度.

In general, pure C-based algorithms are ~100 times faster than pure Python ones (in DTAIDistance this is the difference between distance() and distance_fast()). For the C-based methods the differences are mainly due to flexibility of the methods. Passing a custom norm, for example, will slow down the method (more function calls). Also, different methods have different options which cause more or less switch statements in the algorithm. For example, DTAIDistance, offers quite a number of options to tune the method because it prefers early stopping the computation over further optimizations (also observed by Felipe Mello). Furthermore, different methods store different amounts of data. The DTAIDistance distance method does not store the entire matrix to also offer linear space complexity (the full matrix is obtained using the warping_paths method that has quadratic space complexity). In general for DTW it is recommended to use a window to reduce also the time complexity a bit.

对于DTAIDistance,所有设计选择都考虑了时间序列聚类应用程序(distance_matrix_fast方法).这是不允许自定义规范的另一个原因. DTW代码必须是纯C语言,才能在C代码级别上支持并行化,并且具有最小的开销(它使用OpenMP)来计算序列之间的所有成对距离.

For DTAIDistance, all the design choices were made with timeseries clustering applications in mind (the distance_matrix_fast method). This is another reason not to allow custom norms. The DTW code needs to be pure C to support parallelization on the level of C-code and have minimal overhead (it uses OpenMP) to compute all pairwise distances between series.

这篇关于不同的库具有不同的结果和性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆