相互最远的k个元素(聚类?) [英] Most mutually distant k elements (clustering?)

查看:65
本文介绍了相互最远的k个元素(聚类?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的机器学习问题:

I have a simple machine learning question:

我有n(〜110)个元素,以及所有成对距离的矩阵.我想选择距离最远的10个元素.也就是说,我想

I have n (~110) elements, and a matrix of all the pairwise distances. I would like to choose the 10 elements that are most far apart. That is, I want to

Maximize:
  Choose 10 different elements.
  Return min distance over (all pairings within the 10).

我的距离度量是对称的,并且考虑到三角形不等式.

My distance metric is symmetric and respects the triangle inequality.

我可以使用哪种算法?我的第一个直觉是执行以下操作:

What kind of algorithm can I use? My first instinct is to do the following:

  1. 将n个元素聚集成20个 集群.
  2. 仅将每个群集替换为 该集群的元素是 离...的均值元素最远 原始的.
  3. 使用蛮力解决 剩下的20个问题 考生.幸运的是20选择10是 只有184,756.
  1. Cluster the n elements into 20 clusters.
  2. Replace each cluster with just the element of that cluster that is furthest from the mean element of the original n.
  3. Use brute force to solve the problem on the remaining 20 candidates. Luckily, 20 choose 10 is only 184,756.


由于etarion的深刻见解,在优化问题说明中将(距离)的返回总和"更改为返回最小距离".


thanks to etarion's insightful comment, changed "Return sum of (distances)" to "Return min distance" in the optimization problem statement.

推荐答案

在这里,您可以通过采用凸松弛来解决此组合优化问题.

Here's how you might approach this combinatorial optimization problem by taking the convex relaxation.

让D为一个上三角矩阵,您的距离在上三角上. IE.我在哪里j,D_i,j是元素i和j之间的距离. (大概,对角线上也会有零.)

Let D be an upper triangular matrix with your distances on the upper triangle. I.e. where i < j, D_i,j is the distance between elements i and j. (Presumably, you'll have zeros on the diagonal, as well.)

然后您的目标是最大化x'* D * x,其中x是二进制值,其中10个元素设置为1,其余元素设置为0.(将x中的ith项设置为1类似于选择ith元素为您的10个元素之一.)

Then your objective is to maximize x'*D*x, where x is binary valued with 10 elements set to 1 and the rest to 0. (Setting the ith entry in x to 1 is analogous to selecting the ith element as one of your 10 elements.)

与这样的组合问题有关的标准"凸优化问题是放宽约束,以使x不必为离散值.这样做给我们带来了以下问题:

The "standard" convex optimization thing to do with a combinatorial problem like this is to relax the constraints such that x need not be discrete valued. Doing so gives us the following problem:

最大化y'* D * y 满足:对于所有i,0< = y_i< = 1,1'* y = 10

maximize y'*D*y subject to: 0 <= y_i <= 1 for all i, 1'*y = 10

(道德上)这是二次方程序. (如果我们用D + D'替换D,它将变成真正的二次程序,得到的y应该没有什么不同.)您可以使用现成的QP解算器,或直接将其插入您选择的凸优化求解器(例如cvx).

This is (morally) a quadratic program. (If we replace D with D + D', it'll become a bona fide quadratic program and the y you get out should be no different.) You can use an off-the-shelf QP solver, or just plug it in to the convex optimization solver of your choice (e.g. cvx).

得到的y不一定是(也可能不是)二进制向量,但是您可以通过多种方式将标量值转换为离散值. (最简单的方法可能是在y_i最高的10个条目中让x为1,但是您可能需要做一些更复杂的事情.)无论如何,y'* D * y与得到的y的确给出您是x'* D * x最佳值的上限,因此,如果从y构造的x具有非常接近y'* D * y的x'* D * x,您可以对近似值感到满意.

The y you get out need not be (and probably won't be) a binary vector, but you can convert the scalar values to discrete ones in a bunch of ways. (The simplest is probably to let x be 1 in the 10 entries where y_i is highest, but you might need to do something a little more complicated.) In any case, y'*D*y with the y you get out does give you an upper bound for the optimal value of x'*D*x, so if the x you construct from y has x'*D*x very close to y'*D*y, you can be pretty happy with your approximation.

请让我知道其中是否不清楚,符号或其他方式.

Let me know if any of this is unclear, notation or otherwise.

这篇关于相互最远的k个元素(聚类?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆