如何使用除欧几里德距离以外的其他距离公式以k表示 [英] how to use different distance formula other than euclidean distance in k means

查看:99
本文介绍了如何使用除欧几里德距离以外的其他距离公式以k表示的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理纬度经度数据.我必须根据两点之间的距离进行聚类.现在两个不同点之间的距离是=ACOS(SIN(lat1)*SIN(lat2)+COS(lat1)*COS(lat2)*COS(lon2-lon1))*6371

我想在R中使用k均值.在此过程中,有什么方法可以覆盖距离计算?

解决方案

K均值不是基于距离的

它基于方差最小化.方差之和公式等于欧几里得距离的平方和,但其他距离的反义不成立.. >

如果您希望对其他距离使用类似k均值的算法(均值不是合适的估计量),请使用 k-medoids (PAM).与k-means相比,k-medoids会收敛于任意距离函数!

对于曼哈顿距离,您还可以使用K中值.中位数是L1范数的合适估计量(中位数使差之和最小;均值使平方和之差最小).

对于您的特定用例,您还可以将数据转换为3D空间,然后使用(平方)欧几里德距离,从而使用k-均值.但是您的群集中心将在地下某个地方!

I am working with latitude longitude data. I have to make clusters based on distance between two points. Now distance between two different point is =ACOS(SIN(lat1)*SIN(lat2)+COS(lat1)*COS(lat2)*COS(lon2-lon1))*6371

I want to use k means in R. Is there any way I can override distance calculation in that process?

解决方案

K-means is not distance based

It is based on variance minimization. The sum-of-variance formula equals the sum of squared Euclidean distances, but the converse, for other distances, will not hold.

If you want to have an k-means like algorithm for other distances (where the mean is not an appropriate estimator), use k-medoids (PAM). In contrast to k-means, k-medoids will converge with arbitrary distance functions!

For Manhattan distance, you can also use K-medians. The median is an appropriate estimator for L1 norms (the median minimizes the sum-of-differences; the mean minimizes the sum-of-squared-distances).

For your particular use case, you could also transform your data into 3D space, then use (squared) Euclidean distance and thus k-means. But your cluster centers will be somewhere underground!

这篇关于如何使用除欧几里德距离以外的其他距离公式以k表示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆