时间序列距离度量 [英] Time series distance metric
问题描述
为了聚类一组时间序列,我正在寻找一种智能的距离度量标准. 我尝试了一些众所周知的指标,但没有一个适合我的情况.
In order to clusterize a set of time series I'm looking for a smart distance metric. I've tried some well known metric but no one fits to my case.
ex:假设我的集群算法提取了这三个质心[s1,s2,s3]:
ex: Let's assume that my cluster algorithm extracts this three centroids [s1, s2, s3]:
我想将这个新示例[sx]放在最相似的集群中:
I want to put this new example [sx] in the most similar cluster:
最相似的质心是第二个质心,因此我需要找到一个距离函数d,它赋予我d(sx, s2) < d(sx, s1)
和d(sx, s2) < d(sx, s3)
The most similar centroids is the second one, so I need to find a distance function d that gives me d(sx, s2) < d(sx, s1)
and d(sx, s2) < d(sx, s3)
修改
在这里使用度量[余弦,欧几里得,明可夫斯基,动态类型翘曲] ]
Here the results with metrics [cosine, euclidean, minkowski, dynamic type warping] ]3
修改2
用户Pietro P建议将距离应用于时间序列的累积版本 该解决方案有效,这里是曲线图和度量标准:
User Pietro P suggested to apply the distances on the cumulated version of the time series The solution works, here the plots and the metrics:
推荐答案
很好的问题!在这些时间序列上使用R ^ n的任何标准距离(欧几里得,曼哈顿或通常的minkowski)都无法获得所需的结果,因为这些度量标准与R ^ n坐标的排列无关(而时间是严格排序的,并且是您要捕捉的现象.
nice question! using any standard distance of R^n (euclidean, manhattan or generically minkowski) over those time series cannot achieve the result you want, since those metrics are independent of the permutations of the coordinate of R^n (while time is strictly ordered and it is the phenomenon you want to capture).
一个简单的窍门,可以使用时间序列的累积版本(随时间增加的时间总和)来执行您所要求的,然后应用标准指标. 使用曼哈顿度量标准,您将获得两个时间序列之间的距离,即两个时间序列之间的累积版本之间的距离.
A simple trick, that can do what you ask is using the cumulated version of the time series (sum values over time as time increases) and then apply a standard metric. Using the Manhattan metric, you would get as a distance between two time series the area between their cumulated versions.
这篇关于时间序列距离度量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!