聚类多元时间序列-有关距离矩阵的问题 [英] Clustering multivariate time series - question regarding distance matrix
问题描述
我正在尝试使用R对气象站进行群集.这些站按小时间隔提供温度,风速,湿度等数据.我可以使用tsclust库轻松地对单变量时间序列进行聚类,但是当我对多变量序列进行聚类时会出现错误.
I am trying to cluster meteorological stations using R. Stations provide such data as temperature, wind speed, humidity and some more on hourly intervals. I can easily cluster univariate time series using tsclust library, but when I cluster multivariate series I get errors.
我有一个数据作为列表,所以每个列表元素都是一个矩阵,其中一个站的时间序列数据(变量是列,行是不同的时间戳).
I have data as a list so each list element is a matrix with time series data of one station (variables are columns and rows are different timestamp).
如果我运行:
tsclust(data, k = 2,
distance = 'Euclidean', seed = 3247, trace = TRUE)
我得到错误:do.call(.External,c(list(CFUN,x,y,pairwise,if(!is.function(method))get(method)else method),错误,不是标量返回值
I get error: Error in do.call(.External, c(list(CFUN, x, y, pairwise, if (!is.function(method)) get(method) else method), : not a scalar return value
如果尝试使用
dist(data, method="euclidean")
也许无法为此类数据计算欧几里得距离?如果是,那么可以计算出多少距离?
Maybe Euclidean distance can not be calculated for such data? If yes, then what distances could be calculated?
推荐答案
如果您的系列的长度相同,您可以将它们转换为向量,然后重新调整尺寸.但是,就像Anony-Mousse提到的那样,将欧几里得距离与具有不同比例的变量一起使用可能会产生问题,因此考虑使用 zscore
进行标准化:
If your series have the same length,
you could just transform them into a vector and then re-adjust dimensions.
However, like Anony-Mousse mentioned,
using Euclidean distance with variables that have different scales could be problematic,
so considering normalizing with zscore
:
series <- zscore(data)
pc <- tsclust(lapply(series, as.vector), distance="Euclidean", seed=3247L, trace=TRUE)
pc@datalist <- series
# replace ncol with the actual number of columns from your data
pc@centroids <- lapply(pc@centroids, matrix, ncol=3L)
这篇关于聚类多元时间序列-有关距离矩阵的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!