在tsvector中的每个元素上使用Levenshtein函数? [英] Using Levenshtein function on each element in a tsvector?

查看:101
本文介绍了在tsvector中的每个元素上使用Levenshtein函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Postgres创建模糊搜索,并且一直在使用django-watson作为基础搜索引擎来工作。

I'm trying to create a fuzzy search using Postgres and have been using django-watson as a base search engine to work off of.

我有一个字段叫做search_tsv,它是一个tsvector,其中包含我要搜索的模型的所有字段值。

I have a field called search_tsv that its a tsvector containing all the field values of the model that I want to search on.

我想使用Levenshtein函数,它确实可以完成我的工作想要在文本字段上。但是,我真的不知道如何在tsvector的每个元素上运行它。

I was wanting to use the Levenshtein function, which does exactly what I want on a text field. However, I dont really know how to run it on each individual element of the tsvector.

有没有办法做到这一点?

Is there a way to do this?

推荐答案

我会考虑使用扩展名 pg_trgm 代替 levenshtein()。如果使用GiST索引进行备份,则可以快几个数量级,可以使用新的 PostgreSQL 9.1中的KNN功能

I would consider using the extension pg_trgm instead of levenshtein(). It is several orders of magnitude faster if you back it up with a GiST index, that can make use of the new KNN feature in PostgreSQL 9.1.

每个数据库安装一次扩展:

Install the extension once per database:

CREATE EXTENSION pg_trgm;

并使用 <-> 运算符,或 similarity()函数。已经在SO上发布了几个很好的答案,请搜索 pg_tgrm [PostgreSQL] ...

And make use of the <-> or % operator, or the similarity() function. Several good answers have been posted on SO already, Search for pg_tgrm [PostgreSQL] ...

按照您想要的镜头进行野外射击:

Wild shot at what you may want:

WITH x AS (
    SELECT unnest(string_to_array(trim(strip(
      'fat:2,4 cat:3 rat:5A'::tsvector)::text, ''''), ''' ''')) AS val
    )                                    -- provide ts_vector, extract strings
    , y AS( SELECT 'brat'::text AS term) -- provide term to match
SELECT val, term
      ,(val <-> term) AS trg_dist        -- distance operator
      ,levenshtein(val, term) AS lev_dist
FROM   x, y;

返回值:

 val | term | trg_dist | lev_dist
-----+------+----------+----------
 cat | brat |    0.875 |        2
 fat | brat |    0.875 |        2
 rat | brat | 0.714286 |        1

这篇关于在tsvector中的每个元素上使用Levenshtein函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆