测量文档集之间的相似性 [英] Measuring similarity between document sets

查看:101
本文介绍了测量文档集之间的相似性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出于说明目的,我们假设这是一个论坛服务.我需要计算每个用户的帖子之间的相似度",这样结果将是这样的:

among posts by user A, similarity 60%
among posts by user B, similarity 20%
...

我正在处理多字节字符串,所以我想我在这里受搜索引擎的困扰.我们已经使用Solr,已经实现了更多的LikeThis,但是我不太确定如何构造查询.任何帮助表示赞赏!

解决方案

可能 Carrot2 将使您感兴趣(并且与此博客相关的) >

For illustration purposes, let's assume this is a forum service. I need to calculate the "similarity" among each users' posts, so that the result would be something like:

among posts by user A, similarity 60%
among posts by user B, similarity 20%
...

I'm dealing with multibyte strings, so I guess I'm stuck with search engines here. We already use Solr, already have moreLikeThis implemented, but I'm not quite sure how to construct the query. Any help appreciated!

解决方案

Possibly Carrot2 will interest you (and this blog related to it)

这篇关于测量文档集之间的相似性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆