高维数据的ELKI Kmeans聚类任务失败错误 [英] ELKI Kmeans clustering Task failed error for high dimensional data

查看:131
本文介绍了高维数据的ELKI Kmeans聚类任务失败错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有60000个文档,我在gensim中进行了处理,并得到60000 * 300的矩阵.我将此导出为csv文件.当我在ELKI环境中导入此文件并运行Kmeans群集时,我遇到了以下错误.

I have a 60000 documents which i processed in gensim and got a 60000*300 matrix. I exported this as a csv file. When i import this in ELKI environment and run Kmeans clustering, i am getting below error.

Task failed
de.lmu.ifi.dbs.elki.data.type.NoSupportedDataTypeException: No data type found satisfying: NumberVector,field AND NumberVector,variable
Available types: DBID DoubleVector,variable,mindim=266,maxdim=300 LabelList
    at de.lmu.ifi.dbs.elki.database.AbstractDatabase.getRelation(AbstractDatabase.java:126)
    at de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm.run(AbstractAlgorithm.java:81)
    at de.lmu.ifi.dbs.elki.workflow.AlgorithmStep.runAlgorithms(AlgorithmStep.java:105)
    at de.lmu.ifi.dbs.elki.KDDTask.run(KDDTask.java:112)
    at de.lmu.ifi.dbs.elki.application.KDDCLIApplication.run(KDDCLIApplication.java:61)
    at [...]

以下是我使用的ELKI设置

Below is the ELKI settings i have used

推荐答案

这听起来很奇怪,但是我找到了解决此问题的方法,方法是打开导出的CSV文件并执行Save As并再次另存为CSV文件.原始文件的大小为437MB,第二个文件为163MB.我使用numpy函数np.savetxt保存doc2vec向量.因此,这似乎是一个Python问题,而不是ELKI问题.

This sounds strange, but i found the solution to this issue by opening the exported CSV file and doing Save As and saving again as a CSV file. While size of the original file is 437MB, the second file is 163MB. I used the numpy function np.savetxt for saving the doc2vec vector. So it seems to be a Python issue instead of being ELKI issue.

编辑:以上解决方案没有用.相反,我导出了使用gensim库创建的doc2vec输出,而导出值的格式明确地确定为%1.22e.即导出的值是指数格式,值的长度为22.下面是整行代码.

Edit: Above solution is not useful. I instead exported the doc2vec output which was created using gensim library and while exporting format of the values were decided explicitly as %1.22e. i.e. the values exported are in exponential format and values have length of 22. Below is the entire line of code.

textVect = model.docvecs.doctag_syn0
np.savetxt('D:\Backup\expo22.csv',textVect,delimiter=',',fmt=('%1.22e'))

这样创建的

CSV文件在ELKI环境中运行没有任何问题.

CSV file thus created runs without any issue in ELKI environment.

这篇关于高维数据的ELKI Kmeans聚类任务失败错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆