在视图中使用全文搜索+ GIN(Django 1.11) [英] Use of full-text search + GIN in a view (Django 1.11 )

查看:149
本文介绍了在视图中使用全文搜索+ GIN(Django 1.11)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一些帮助在Django视图中构建适当的查询,以便使用GIN索引进行全文本搜索。我有一个很大的数据库(约40万行),需要对它的3个字段进行全文搜索。尝试使用 django docs搜索,这是代码之前杜松子酒。它可以工作,但是需要6秒钟以上才能搜索所有字段。接下来,我尝试实现 GIN 索引以加快速度我的搜索。如何构建它已经有很多问题。但是我的问题是-使用GIN索引进行搜索时视图查询会如何变化?我应该搜索哪些字段?




在GIN之前:


models.py

I need some help with building proper query in a django view for full-text search using GIN index. I have quite a big database (~400k lines) and need to do a full-text search on 3 fields from it. Tried to use django docs search and this is code BEFORE GIN. It works, but takes 6+ seconds to search over all fields. Next I tried to implement a GIN index to speed up my search. There are a lot of questions already how to build it. But my question is - how does the view query change when using a GIN index for search? What fields should I search?

Before GIN:

models.py

class Product(TimeStampedModel):
    product_id = models.AutoField(primary_key=True, )
    shop = models.ForeignKey('Shop', to_field='shop_name')
    brand = models.ForeignKey('Brand', to_field='brand_name')
    title = models.TextField(blank=False, null=False)
    description = models.TextField(blank=True, null=True)

views.py

def get_cosmetic(request):
if request.method == "GET":
    pass
else:
    search_words = request.POST.get('search')
    search_vectors = SearchVector('title', weight='B')+ SearchVector('description', weight='C') + SearchVector('brand__brand_name', weight='A')

    products = Product.objects.annotate(search = search_vectors, rank=SearchRank(search_vectors, search))\
        .filter(search=search_words ).order_by('-rank')

    return render(request, 'example.html', {"products": products})

GIN之后:

models.py

class ProductManager(models.Manager):
def with_documents(self):
    vector = pg_search.SearchVector('brand__brand_name', weight='A') +\
            pg_search.SearchVector('title', weight='A')+\
            pg_search.SearchVector('description', weight='C')
    return self.get_queryset().annotate(document=vector)


class Product(TimeStampedModel):
    product_id = models.AutoField(primary_key=True, )
    shop = models.ForeignKey('Shop', to_field='shop_name')
    brand = models.ForeignKey('Brand', to_field='brand_name')
    title = models.TextField(blank=False, null=False)
    description = models.TextField(blank=True, null=True)

    search_vector = pg_search.SearchVectorField(null=True)

    objects = ProductManager()

    class Meta:
        indexes = [
            indexes.GinIndex(fields=['search_vector'], name='title_index')
        ]

    #update search_vector every time the entry updates
    def save(self, *args, **kwargs):
        super().save(*args, **kwargs)
        if 'update_fields' not in kwargs or 'search_vector' not in kwargs['update_fields']:
            instance = self._meta.default_manager.with_documents().get(pk=self.pk)
            instance.search_vector = instance.document
            instance.save(update_fields=['search_vector'])

views.py

def get_cosmetic(request):
if request.method == "GET":
    pass

else:
    search_words = request.POST.get('search')    
    products = ?????????
    return render(request, 'example.html', {"products": products})


推荐答案

回答我自己的问题:

products = Product.objects.annotate(rank=SearchRank(F('search_vector'), search_words))
                          .filter(search_vector=search_words)
                          .order_by('-rank')


这意味着您应该搜索索引字段-在本例中,为 search_vector 字段。

另外,我还在ProductManager()类中更改了代码,因此现在可以使用


This means you should search your index field - in my case search_vector field.
Also I have changed my code a bit in ProductManager() class, so now I can just use

products = Product.objects.with_documents(search_words)

其中 with_documents()是自定义ProductManager()的自定义函数。此更改的方法是此处(第30页)


这些代码的全部作用:

1)创建带有字段得分的search_vector

2)创建GIN索引以通过ORM进行全文本搜索Django

3)每次模型实例为GIN时都会更新GIN索引已更改


此代码不执行的操作:

1)它不按查询的子字符串的相关性排序。 可能的解决方案。


希望这会帮助某人在Django中进行一些复杂的全文搜索。

Where with_documents() is a custom function of custom ProductManager(). The recipe of this change is here (page 30).

What does all this code do:
1) creates search_vector with scores to fields, field with bigger score - gets higher place in result sorting.
2) creates GIN index for full-text search via ORM Django
3) updates GIN index every time the instance of model is changed

What this code dosn't do:
1) It doesn't sort by relevance of substring which is queried. Possible solution.

Hope this will help somebody with a bit complicated full-text search in Django.

这篇关于在视图中使用全文搜索+ GIN(Django 1.11)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆