在PostgreSQL中查找类似的帖子 [英] Finding similar posts with PostgreSQL

查看:111
本文介绍了在PostgreSQL中查找类似的帖子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张桌子个帖子

CREATE TABLE posts (
  id serial primary key,
  content text
);

当用户提交帖子时,如何将他的帖子与其他帖子进行比较并找到相似的帖子?

我正在寻找类似StackOverflow的相似问题。

When a user submits a post, how can I compare his post with the others and find similar posts?
I'm looking for something like StackOverflow does with the "Similar Questions".

推荐答案

而<一个href = http://www.postgresql.org/docs/current/interactive/textsearch-controls.html#TEXTSEARCH-RANKING rel = noreferrer>文本搜索是一个选项,并非旨在这种类型的搜索为主。典型的用例是根据字典和词干在文档中查找单词,而不是比较整个文档。

While Text Search is an option it is not meant for this type of search primarily. The typical use case would be to find words in a document based on dictionaries and stemming, not to compare whole documents.

我确信StackOverflow已经在相似度搜索中添加了一些技巧,因为这不是一件小事。

I am sure StackOverflow has put some smarts into the similarity search, as this is not a trivial matter.

您可以获得 halfway 相似性函数和运算符 a>由 pg_trgm 模块提供:

You can get halfway decent results with the similarity function and operators provided by the pg_trgm module:

SELECT content, similarity(content, 'grand new title asking foo') AS sim_score
FROM   posts
WHERE  content  % 'grand new title asking foo'
ORDER  BY 2 DESC, content;

确保具有为此内容上的 GiST索引

Be sure to have a GiST index on content for this.

但是您可能必须做更多的事情。在确定新内容中的关键字之后,可以将其与文本搜索结合起来。

But you'll probably have to do more. You could combine it with Text Search after identifying keywords in the new content ..

这篇关于在PostgreSQL中查找类似的帖子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆