可视化数据和集群 [英] Visualize data and clustering

查看:90
本文介绍了可视化数据和集群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在编写一个python脚本来查找文档之间的相似性。我已经计算了每个文档对的相似性得分并将其存储在字典中。看起来像这样:

i am currently writing a python script to find the similarity between documents.I have already calculated the similarities score for each document pairs and store them in dictionaries. It looks something like this:

{(8328,8327):1.0,(8313,8306):0.12405229825691289,(8329,8328):1.0,(8322,8321 ):0.99999999999999989,(8328,8329):1.0,(8306,8316):0.12405229825691289,(8320,8319):0.67999999999999989,(8337,8336):1.0000000000000002,(8319,8320):0.67999999999999989,(8313,8316): 0.99999999999999989,(8321,8322):0.99999999999999989,(8330,8328):1.0}

{(8328, 8327): 1.0, (8313, 8306): 0.12405229825691289, (8329, 8328): 1.0, (8322, 8321): 0.99999999999999989, (8328, 8329): 1.0, (8306, 8316): 0.12405229825691289, (8320, 8319): 0.67999999999999989, (8337, 8336): 1.0000000000000002, (8319, 8320): 0.67999999999999989, (8313, 8316): 0.99999999999999989, (8321, 8322): 0.99999999999999989, (8330, 8328): 1.0}

我的最终目标是将相似的文档聚集在一起。上面的数据可以用其他方式查看。假设文档对(8313,8306)。相似度分数是0.12405。我可以指定分数的倒数是文档8313和8306之间的距离。因此,相似的文档将聚集在一起,而不太相似的文档将基于它们的距离分开。

My final goal is to cluster the similar documents together. The data above can be viewed in another way. Let's say the document pair (8313,8306). The similarity score is 0.12405. I can specified that the inverse of the score will be the distance between document 8313 and 8306. Therefore, similar documents will cluster closer together while not-so-similar documents will be further apart based on their distance.

我的问题是,是否有任何开源可视化工具可以帮助我实现这一目标?

My question is, IS there any open source visualization tool that can help me to achieve this?

推荐答案

我认为您必须使用MDS

I think you have to use MDS

http://en.wikipedia.org/wiki/MultiDimension_scaling

这篇关于可视化数据和集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆