在视图中使用全文搜索 + GIN (Django 1.11) [英] Use of full-text search + GIN in a view (Django 1.11 )
问题描述
我需要一些帮助,以便在 Django 视图中使用 GIN 索引为全文搜索构建正确的查询.我有一个相当大的数据库(约 40 万行),需要对其中的 3 个字段进行全文搜索.尝试使用 django docs search,这是代码之前杜松子酒.它有效,但需要 6 秒以上的时间来搜索所有字段.接下来我尝试实现一个 GIN 索引来加速我的搜索.已经有很多问题如何构建它.但我的问题是 - 在使用 GIN 索引进行搜索时,视图查询如何更改? 我应该搜索哪些字段?
在 GIN 之前:
models.py
I need some help with building proper query in a django view for full-text search using GIN index. I have quite a big database (~400k lines) and need to do a full-text search on 3 fields from it. Tried to use django docs search and this is code BEFORE GIN. It works, but takes 6+ seconds to search over all fields. Next I tried to implement a GIN index to speed up my search. There are a lot of questions already how to build it. But my question is - how does the view query change when using a GIN index for search? What fields should I search?
Before GIN:
models.py
class Product(TimeStampedModel):
product_id = models.AutoField(primary_key=True)
shop = models.ForeignKey("Shop", to_field="shop_name")
brand = models.ForeignKey("Brand", to_field="brand_name")
title = models.TextField(blank=False, null=False)
description = models.TextField(blank=True, null=True)
views.py
def get_cosmetic(request):
if request.method == "GET":
pass
else:
search_words = request.POST.get("search")
search_vectors = (
SearchVector("title", weight="B")
+ SearchVector("description", weight="C")
+ SearchVector("brand__brand_name", weight="A")
)
products = (
Product.objects.annotate(
search=search_vectors, rank=SearchRank(search_vectors, search)
)
.filter(search=search_words)
.order_by("-rank")
)
return render(request, "example.html", {"products": products})
在 GIN 之后:
models.py
class ProductManager(models.Manager):
def with_documents(self):
vector = (
pg_search.SearchVector("brand__brand_name", weight="A")
+ pg_search.SearchVector("title", weight="A")
+ pg_search.SearchVector("description", weight="C")
)
return self.get_queryset().annotate(document=vector)
class Product(TimeStampedModel):
product_id = models.AutoField(primary_key=True)
shop = models.ForeignKey("Shop", to_field="shop_name")
brand = models.ForeignKey("Brand", to_field="brand_name")
title = models.TextField(blank=False, null=False)
description = models.TextField(blank=True, null=True)
search_vector = pg_search.SearchVectorField(null=True)
objects = ProductManager()
class Meta:
indexes = [
indexes.GinIndex(
fields=["search_vector"],
name="title_index",
),
]
# update search_vector every time the entry updates
def save(self, *args, **kwargs):
super().save(*args, **kwargs)
if (
"update_fields" not in kwargs
or "search_vector" not in kwargs["update_fields"]
):
instance = (
self._meta.default_manager
.with_documents().get(pk=self.pk)
)
instance.search_vector = instance.document
instance.save(update_fields=["search_vector"])
views.py
def get_cosmetic(request):
if request.method == "GET":
pass
else:
search_words = request.POST.get('search')
products = ?????????
return render(request, 'example.html', {"products": products})
推荐答案
回答我自己的问题:
products = (
Product.objects.annotate(rank=SearchRank(F("search_vector"), search_words))
.filter(search_vector=search_words)
.order_by("-rank")
)
这意味着您应该搜索索引字段 - 在我的例子中是 search_vector
字段.
此外,我在 ProductManager() 类中对代码进行了一些更改,因此现在我可以使用
This means you should search your index field - in my case search_vector
field.
Also I have changed my code a bit in ProductManager() class, so now I can just use
products = Product.objects.with_documents(search_words)
其中 with_documents()
是自定义 ProductManager() 的自定义函数.此更改的秘诀是 此处(第 29 页).
所有这些代码的作用是什么:
Where with_documents()
is a custom function of custom ProductManager(). The recipe of this change is here (page 29).
What does all this code do:
- 创建带有分数到字段的 search_vector,分数较大的字段 - 在结果排序中获得更高的位置.
- 通过 ORM Django 创建用于全文搜索的 GIN 索引
- 每次更改模型实例时更新 GIN 索引
这段代码没有做什么: - 它不按被查询子串的相关性排序.可能的解决方案.
希望这会对在 Django 中进行有点复杂的全文搜索的人有所帮助.
这篇关于在视图中使用全文搜索 + GIN (Django 1.11)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!