如何使用除欧几里德距离以外的其他距离公式以k表示 [英] how to use different distance formula other than euclidean distance in k means
问题描述
我正在处理纬度经度数据.我必须根据两点之间的距离进行聚类.现在两个不同点之间的距离是=ACOS(SIN(lat1)*SIN(lat2)+COS(lat1)*COS(lat2)*COS(lon2-lon1))*6371
我想在R中使用k均值.在此过程中,有什么方法可以覆盖距离计算?
K均值不是基于距离的
它基于方差最小化.方差之和公式等于欧几里得距离的平方和,但其他距离的反义不成立.. >
如果您希望对其他距离使用类似k均值的算法(均值不是合适的估计量),请使用 k-medoids (PAM).与k-means相比,k-medoids会收敛于任意距离函数! 对于曼哈顿距离,您还可以使用K中值.中位数是L1范数的合适估计量(中位数使差之和最小;均值使平方和之差最小). 对于您的特定用例,您还可以将数据转换为3D空间,然后使用(平方)欧几里德距离,从而使用k-均值.但是您的群集中心将在地下某个地方! I am working with latitude longitude data. I have to make clusters based on distance between two points. Now distance between two different point is I want to use k means in R. Is there any way I can override distance calculation in that process? It is based on variance minimization. The sum-of-variance formula equals the sum of squared Euclidean distances, but the converse, for other distances, will not hold. If you want to have an k-means like algorithm for other distances (where the mean is not an appropriate estimator), use k-medoids (PAM). In contrast to k-means, k-medoids will converge with arbitrary distance functions! For Manhattan distance, you can also use K-medians. The median is an appropriate estimator for L1 norms (the median minimizes the sum-of-differences; the mean minimizes the sum-of-squared-distances). For your particular use case, you could also transform your data into 3D space, then use (squared) Euclidean distance and thus k-means. But your cluster centers will be somewhere underground! 这篇关于如何使用除欧几里德距离以外的其他距离公式以k表示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!=ACOS(SIN(lat1)*SIN(lat2)+COS(lat1)*COS(lat2)*COS(lon2-lon1))*6371
K-means is not distance based