Postgres反向LIKE查找索引和性能 [英] Postgres reverse LIKE lookup indexing and performance

查看:262
本文介绍了Postgres反向LIKE查找索引和性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个音乐家表,其中包含多个字符串字段的记录,例如:

We have a musicians table containing records with multiple string fields, say:


  • Jimi,Hendrix,Guitar

  • Phil,Collins,Drums

  • Sting,,Bass

  • Ringo,Starr,Drums

  • Paul,McCartney,Bass

  • "Jimi", "Hendrix", "Guitar"
  • "Phil", "Collins", "Drums"
  • "Sting", "", "Bass"
  • "Ringo", "Starr", "Drums"
  • "Paul", "McCartney", "Bass"

我想传递postgres长字符串,比如说:

I want to pass postgres a long string, say:


众所周知,Jimi喜欢在舞台上为他的吉他设置灯光并将所有鼓声打成

"It is known that Jimi liked to set light to his guitar and smash up all the drums while on stage."

我希望得到回复有任何匹配的字段 - 最好按照最匹配的顺序排列:

and i want to get returned the fields that have any matches - preferably in order of the most matches first:


  • Jimi , Hendrix,吉他

  • Phil,Collins,

  • Ringo,Starr,

  • "Jimi", "Hendrix", "Guitar"
  • "Phil", "Collins", "Drums"
  • "Ringo", "Starr", "Drums"

因为我需要搜索到不区分大小写,我正在构建这样的查询......

because i need the search to be case insensitive, i'm constructing a query like this...

select * from musicians where lowercase_string like '%'||firstname||'%' or  lowercase_string like '%'||lastname||'%' or lowercase_string like '%'||instrument||'%'

然后循环(在我的情况下用ruby来捕获最匹配的结果。

and then looping through (in ruby in my case) to capture the result with the most matches.

然而这在sql阶段(1分钟+)非常慢。

this is however very slow in the sql stage (1 minute+).

我尝试使用pg_trgm添加小写GIN索引这里 - 但它没有帮助 - 大概是因为类似的查询回到了前面?

i've tried adding lower-case GIN index using pg_trgm as suggested here - but it's not helping - presumably because the like query is back to front?

谢谢!

推荐答案

通过我的测试,似乎没有三元组索引可以帮助您查询 。并且没有其他索引类型可能加速基于(I)LIKE / FTS的搜索。

With my testing, it seems that no trigram index could help your query at all. And no other index type could possibly speed up an (I)LIKE / FTS based search.

我应该提到下面的所有查询都使用三元组索引,当它们被查询反向:当表包含文档(已编制索引)时,您的参数是查询。 (I)LIKE变体变体f.ex.使用它快2-3倍。

I should mention that all of the queries below use the trigram indexes, when they are queried "reversed": when the table contains the document (which is indexed), and your parameter is the query. The (I)LIKE variant variant f.ex. 2-3 times faster with it.

我测试的这些查询:

select *
from   musicians
where  :input_string ilike '%' || firstname  || '%'
or     :input_string ilike '%' || lastname   || '%'
or     :input_string ilike '%' || instrument || '%'

起初,FTS似乎是一个好主意,但我的测试显示即使没有排名,它比(I)LIKE变体慢60-100倍。 (所以即使你不需要用这些方法对结果进行后处理,这些也不值得。)

At first, FTS seemed a great idea, but my testing shows that even without ranking, it is 60-100 times slower than the (I)LIKE variant. (So even, when you don't have to post-process results with these methods, these are not worth it).

select *
from   musicians
where  to_tsvector(:input_string) @@ (plainto_tsquery(firstname) || plainto_tsquery(lastname) || plainto_tsquery(lastname))

然而, ORDER BY rank 并没有进一步放缓:它是70-120倍慢于(I)LIKE变体。

However, ORDER BY rank doesn't slow down that much further: it is 70-120 times slower than the (I)LIKE variant.

select   *
from     musicians
where    to_tsvector(:input_string) @@ (plainto_tsquery(firstname) || plainto_tsquery(lastname) || plainto_tsquery(lastname))
order by ts_rank(to_tsvector(:input_string), plainto_tsquery(firstname) || plainto_tsquery(lastname) || plainto_tsquery(lastname))

然后,为了最后的努力,我尝试了(相当新的)单词相似度运算符trigram模块:<%%> (可从PostgreSQL 9.6获得)。

Then, for a last effort, I tried the (fairly new) "word similarity" operators of the trigram module: <% and %> (available from PostgreSQL 9.6).

select *
from   musicians
where  :input_string %> firstname
or     :input_string %> lastname
or     :input_string %> instrument

select *
from   musicians
where  firstname  <% :input_string
or     lastname   <% :input_string
or     instrument <% :input_string

这些比FTS快一些:比(I)LIKE变种慢约50-70倍。

These were somewhat faster then FTS: around 50-70 times slower than the (I)LIKE variant.

(部分工作) rextester :它运行针对PostgreSQL 9.5,所以9.6运营商显然不会在这里运行。

(Partially working) rextester: it is run against PostgreSQL 9.5, so the 9.6 operators obviously won't run here.

更新 IF 全字匹配对您来说足够了,您实际上可以反转您的查询,以便能够使用索引。你需要解析你的查询(也就是长字符串):

Update: IF full word match is enough for you, you can actually reverse your query, to be able to use indexes. You'll need to "parse" your query (aka. "long string") though:

with long_string(ls) as (
  values (:input_string)
),
words(word) as (
  select s
  from   long_string, regexp_split_to_table(ls, '[^[:alnum:]]+') s
  where  s <> ''
)
select   musicians.*
from     musicians, words
where    firstname  ilike word
or       lastname   ilike word
or       instrument ilike word
group by musicians.id

注意:我为每个完整的单词解析了查询。你可以在那里有其他逻辑,或者它甚至可以在客户端解析。

Note: I parsed the query for every complete word. You can have some other logic there, or it can even be parsed in client side.

默认值, btree 索引在这里闪耀,因为它比(I)LIKE的三元组索引快得多(我们无论如何都不需要它们,因为我们在这里寻找完整的单词匹配):

The default, btree index shines here, as it is much faster than the trigram index with (I)LIKE (we won't need them anyway, as we are looking for complete word match here):

with long_string(ls) as (
  values (:input_string)
),
words(word) as (
  select s
  from   long_string, regexp_split_to_table(lower(ls), '[^[:alnum:]]+') s
  where  s <> ''
)
select   musicians.*
from     musicians, words
where    lower(firstname)  = word
or       lower(lastname)   = word
or       lower(instrument) = word
group by musicians.id

http://rextester.com/PSABJ6745

您甚至可以通过类似

sum((lower(firstname)  = word)::int
  + (lower(lastname)   = word)::int
  + (lower(instrument) = word)::int)

这篇关于Postgres反向LIKE查找索引和性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆