我在100维空间中有2,000,000个点.如何将它们群集到K个(例如1000个)群集中? [英] I have 2,000,000 points in 100 dimensionality space. How can I cluster them to K (e.g., 1000) clusters?

查看:89
本文介绍了我在100维空间中有2,000,000个点.如何将它们群集到K个(例如1000个)群集中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题如下.我有M个图像,并为每个图像提取N个特征,每个特征的维数为L.因此,我有M * N个特征(对于我的情况为2,000,000),每个特征为L维数(对于我的情况为100).我需要将这些M * N功能聚集成K个聚类.我该怎么做?谢谢.

The problem comes as follows. I have M images and extract N features for each image, and the dimensionality of each feature is L. Thus, I have M*N features (2,000,000 for my case) and each feature has L dimensionality (100 for my case). I need to cluster these M*N features into K clusters. How can I do it? Thanks.

推荐答案

您是否需要1000个图像或要素或(图像,要素)对的集群?
无论如何,听起来好像您必须减少数据 并使用更简单的方法.

Do you want 1000 clusters of images, or of features, or of (image, feature) pairs ?
In any case, it sounds as though you'll have to reduce the data and use simpler methods.

一种可能是两遍K集群:
a)将200万个数据点分为32个集群,
b)将它们分别分成32个.
如果可行,则生成的32 ^ 2 = 1024个群集可能足以满足您的目的.

One possibility is two-pass K-cluster:
a) split the 2 million data points into 32 clusters,
b) split each of these into 32 more.
If this works, the resulting 32^2 = 1024 clusters might be good enough for your purpose.

那么,您真的需要100个坐标吗? 你能猜出20个最重要的 还是只尝试20个随机子集?

Then, do you really need 100 coordinates ? Could you guess the 20 most important ones, or just try random subsets of 20 ?

有大量文献:Google +image "dimension reduction"给出了约70000次点击.

There's a huge literature: Google +image "dimension reduction" gives ~ 70000 hits.

这篇关于我在100维空间中有2,000,000个点.如何将它们群集到K个(例如1000个)群集中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆