R中的圆柱聚类-带有其他数据的聚类时间戳 [英] Cylindrical Clustering in R - clustering timestamp with other data

查看:109
本文介绍了R中的圆柱聚类-带有其他数据的聚类时间戳的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习R,我必须将一个时间戳字段的数字数据聚类. 参数之一是时间,并且由于数据严格依赖于昼夜,因此我想考虑此数据的球形"性质.

I'm learning R and I have to cluster numeric data with a timestamp field. One of the parameters is a time, and since the data is strictly day-night dependent, I want to take into account the "spherical" nature of this data.

据我所见, skmeans 之类的库无法处理圆柱"数据,而只能处理球形"数据(即所有分量均位于极坐标中).

As far as I saw from the manual, libraries such as skmeans cannot handle "cylindrical" data but only "spherical" data (i.e. where all the components are in polar coordinates).

我对合适的解决方案的想法是:可以将HOUR列(0-24)分解为两个不同的列X,Y,并以极坐标表示时间,例如x ^ 2 + y ^ 2 = 1 . 这样,具有欧氏距离的k均值在解释数据时就不会出现问题.

My idea for a suitable solution is the follwing: I can decompose the HOUR column (0-24) into two different colums X,Y and express the time in polar coordinates, such as x^2+y^2=1. In this way a k-means with euclidean distance should not have problem interpreting the data.

我说得对吗?

推荐答案

以下是hm的映射,其中h是时间(以小时为单位).然后我们尝试kmeans,至少在此测试中它似乎有效:

Here is such a mapping of h to m where h is the time in hours (and fraction of an hour). Then we try kmeans and at least in this test it seems to work:

h <- c(22, 23, 0, 1, 2, 10, 11, 12)
ha <- 2*pi*h/24
m <- cbind(x = sin(ha), y = cos(ha))

kmeans(m, 2)$cluster # compute cluster assignments via kmeans
## [1] 2 2 2 2 2 1 1 1

这篇关于R中的圆柱聚类-带有其他数据的聚类时间戳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆