R中的圆柱聚类-带有其他数据的聚类时间戳 [英] Cylindrical Clustering in R - clustering timestamp with other data
问题描述
我正在学习R,我必须将一个时间戳字段的数字数据聚类. 参数之一是时间,并且由于数据严格依赖于昼夜,因此我想考虑此数据的球形"性质.
I'm learning R and I have to cluster numeric data with a timestamp field. One of the parameters is a time, and since the data is strictly day-night dependent, I want to take into account the "spherical" nature of this data.
据我所见, skmeans 之类的库无法处理圆柱"数据,而只能处理球形"数据(即所有分量均位于极坐标中).
As far as I saw from the manual, libraries such as skmeans cannot handle "cylindrical" data but only "spherical" data (i.e. where all the components are in polar coordinates).
我对合适的解决方案的想法是:可以将HOUR列(0-24)分解为两个不同的列X,Y,并以极坐标表示时间,例如x ^ 2 + y ^ 2 = 1 . 这样,具有欧氏距离的k均值在解释数据时就不会出现问题.
My idea for a suitable solution is the follwing: I can decompose the HOUR column (0-24) into two different colums X,Y and express the time in polar coordinates, such as x^2+y^2=1. In this way a k-means with euclidean distance should not have problem interpreting the data.
我说得对吗?
推荐答案
以下是h
到m
的映射,其中h
是时间(以小时为单位).然后我们尝试kmeans
,至少在此测试中它似乎有效:
Here is such a mapping of h
to m
where h
is the time in hours (and fraction of an hour). Then we try kmeans
and at least in this test it seems to work:
h <- c(22, 23, 0, 1, 2, 10, 11, 12)
ha <- 2*pi*h/24
m <- cbind(x = sin(ha), y = cos(ha))
kmeans(m, 2)$cluster # compute cluster assignments via kmeans
## [1] 2 2 2 2 2 1 1 1
这篇关于R中的圆柱聚类-带有其他数据的聚类时间戳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!