正确的全文索引Rails / PostgreSQL / pg_search [英] Proper full text index Rails/PostgreSQL/pg_search

查看:102
本文介绍了正确的全文索引Rails / PostgreSQL / pg_search的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在测试PostgreSQL全文本搜索(使用pg_search gem)和solr(sunspot_solr gem)的性能。

I am testing performance for PostgreSQL full text search (using pg_search gem) and solr (sunspot_solr gem).

对于400万条记录,我得到 13456 Tsearch的毫秒数和SOLR的 800毫秒(即SOLR查询+数据库检索)。很明显,我需要索引,但是我不确定如何为全文搜索创建索引。我调查发现,对于全文搜索,我应该使用GIN索引。

For 4 million records I am getting 13456 ms for Tsearch and 800 ms with SOLR (that is SOLR query + DB retrival). It is obvious that I need index but I am not sure how to create one for full text search. I investigated and found that for full text search I should use GIN index.

execute "CREATE INDEX products_gin_title ON products USING GIN(to_tsvector('english', title))"

但是我要通过另外两列进行搜索,因此我需要值索引,我不确定如何实现?我对DB部分不是很熟悉。我的搜索代码如下:

But I am searching via two more columns and I need multi-value index and I am not sure how to implement it? I am not very familiar with DB part. My search code looks like:

@results = Product.search_title(params[:search_term]).where("platform_id=? AND product_type=?", params[:platform_id], params[:type_id]).limit(10).all

如何针对此类情况创建适当的查询?

以下是Rails搜索词的SQL输出汽车

Here is SQL output from rails for search term car.

Product Load (12494.0ms)
SELECT 
    "products".*, 
    ( ts_rank((to_tsvector('simple', coalesce("products"."title"::text, ''))), (to_ tsquery('simple', ''' ' || 'car' || ' ''')), 2) ) AS pg_search_rank 
FROM "products" 
WHERE (((to_tsvector('simple', coalesce("products"."tit le"::text, ''))) @@ (to_tsquery('simple', ''' ' || 'car' || ' ''')))) 
    AND (platform_id='26' AND product_type='2') 
ORDER BY pg_search_rank DESC, "products"."id" ASC 
LIMIT 10

编辑:

我正在使用PostgreSQL 8.4。 11,接下来是 EXPLAIN ANALYZE 输出。

I am using PostgreSQL 8.4.11, EXPLAIN ANALYZE output is following.

Limit  (cost=108126.34..108126.36 rows=10 width=3824) (actual time=12228.736..12228.738 rows=10 loops=1)   
->  Sort (cost=108126.34..108163.84 rows=14999 width=3824) (actual time=12228.733..12228.734 rows=10 loops=1)
    Sort Key: (ts_rank(to_tsvector('simple'::regconfig, COALESCE((title)::text, ''::text)), '''car'''::tsquery, 2)), id
    Sort Method:  top-N heapsort  Memory: 18kB
    ->  Seq Scan on products  (cost=0.00..107802.22 rows=14999 width=3824) (actual time=7.532..12224.585 rows=977 loops=1)
        Filter: ((platform_id = 26) AND (product_type = 2) AND (to_tsvector('simple'::regconfig, COALESCE((title)::text, ''::text)) @@ '''car'''::tsquery)) 

Total runtime: 12228.813 ms


推荐答案

此表达式:

to_tsvector('simple', (COALESCE(title::TEXT), ''))

不能与您的索引相对应。

is not sargable against your index.

您应该在完全在查询中使用的表达式:

You should declare the index on the exactly that expression which is used in the query:

CREATE INDEX products_gin_title
ON products
USING GIN(to_tsvector('simple', COALESCE(title::TEXT,'')))

(或使ruby生成

如果要对多个列进行索引,只需将它们串联:

If you want multiple columns to be indexed, just concatenate them:

CREATE INDEX products_gin_title
ON products
USING GIN(to_tsvector('simple', title || ' ' || product_type || ' ' || platform_id))

但是,Ruby应该在完全相同的表达式上过滤索引才能使用

but again, Ruby should be filtering on exactly same expression for the index to be of use.

这篇关于正确的全文索引Rails / PostgreSQL / pg_search的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆