在非常大的稀疏矩阵上聚类? [英] clustering on very large sparse matrix?

查看:117
本文介绍了在非常大的稀疏矩阵上聚类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在非常大的矩阵上进行一些(k均值)聚类.

I am trying to do some (k-means) clustering on a very large matrix.

矩阵大约为500000行x 4000列,但非常稀疏(每行只有几个"1"值).我想获得2000个群集.

The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of "1" values per row). I want to get around 2000 clusters.

我有两个问题: -有人可以推荐一个开源平台或工具来做到这一点吗(也许使用k-means,也许还有更好的东西)? -如何最好地估计算法需要完成的时间?我曾经尝试过一次weka,但是几天后就中止了工作,因为我不知道需要多少时间.

I got two questions: - Can someone recommend an open source platform or tool for doing that (maybe using k-means, maybe with something better)? - How can I best estimate the time the algorithm will need to finish? I tried weka once, but aborted the job after a couple of days because I couldn't tell how much time it would take.

谢谢!

推荐答案

http://lucene.apache. org/mahout/

这篇关于在非常大的稀疏矩阵上聚类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆