我在100维空间中有2,000,000个点.如何将它们群集到K个(例如1000个)群集中? [英] I have 2,000,000 points in 100 dimensionality space. How can I cluster them to K (e.g., 1000) clusters?
问题描述
问题如下.我有M个图像,并为每个图像提取N个特征,每个特征的维数为L.因此,我有M * N个特征(对于我的情况为2,000,000),每个特征为L维数(对于我的情况为100).我需要将这些M * N功能聚集成K个聚类.我该怎么做?谢谢.
The problem comes as follows. I have M images and extract N features for each image, and the dimensionality of each feature is L. Thus, I have M*N features (2,000,000 for my case) and each feature has L dimensionality (100 for my case). I need to cluster these M*N features into K clusters. How can I do it? Thanks.
推荐答案
您是否需要1000个图像或要素或(图像,要素)对的集群?
无论如何,听起来好像您必须减少数据
并使用更简单的方法.
Do you want 1000 clusters of images, or of features, or of (image, feature) pairs ?
In any case, it sounds as though you'll have to reduce the data
and use simpler methods.
一种可能是两遍K集群:
a)将200万个数据点分为32个集群,
b)将它们分别分成32个.
如果可行,则生成的32 ^ 2 = 1024个群集可能足以满足您的目的.
One possibility is two-pass K-cluster:
a) split the 2 million data points into 32 clusters,
b) split each of these into 32 more.
If this works, the resulting 32^2 = 1024 clusters might be good enough for your purpose.
那么,您真的需要100个坐标吗? 你能猜出20个最重要的 还是只尝试20个随机子集?
Then, do you really need 100 coordinates ? Could you guess the 20 most important ones, or just try random subsets of 20 ?
有大量文献:Google +image "dimension reduction"
给出了约70000次点击.
There's a huge literature: Google +image "dimension reduction"
gives ~ 70000 hits.
这篇关于我在100维空间中有2,000,000个点.如何将它们群集到K个(例如1000个)群集中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!