在非常大的稀疏矩阵上聚类? [英] clustering on very large sparse matrix?
问题描述
我正在尝试在非常大的矩阵上进行一些(k均值)聚类.
I am trying to do some (k-means) clustering on a very large matrix.
矩阵大约为500000行x 4000列,但非常稀疏(每行只有几个"1"值).我想获得2000个群集.
The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of "1" values per row). I want to get around 2000 clusters.
我有两个问题: -有人可以推荐一个开源平台或工具来做到这一点吗(也许使用k-means,也许还有更好的东西)? -如何最好地估计算法需要完成的时间?我曾经尝试过一次weka,但是几天后就中止了工作,因为我不知道需要多少时间.
I got two questions: - Can someone recommend an open source platform or tool for doing that (maybe using k-means, maybe with something better)? - How can I best estimate the time the algorithm will need to finish? I tried weka once, but aborted the job after a couple of days because I couldn't tell how much time it would take.
谢谢!
推荐答案
http://lucene.apache. org/mahout/
这篇关于在非常大的稀疏矩阵上聚类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!