在Django模型中使用Trigram(gin_trgm_ops)创建Gin索引 [英] Creating a Gin Index with Trigram (gin_trgm_ops) in Django model

查看:345
本文介绍了在Django模型中使用Trigram(gin_trgm_ops)创建Gin索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

django.contrib.postgres的TrigramSimilarity新功能非常适合我遇到的问题。我将其用于搜索栏以查找难以拼写的拉丁名称。问题在于,有超过200万个名称,而搜索所需的时间比我想要的长。



我想在三元组上创建索引,如在postgres文档
https://www.postgresql.org/docs/ 9.6 / static / pgtrgm.html



但是我不确定如何以Django API可以使用它的方式进行操作。对于postgres文本搜索,有关于如何创建索引的说明,但没有针对三字母组相似性的说明。 https://docs.djangoproject.com/en/1.11 / ref / contrib / postgres / search /#performance



这就是我现在所拥有的:

  class NCBI_names(models.Model):
tax_id = models.ForeignKey(NCBI_nodes,on_delete = models.CASCADE,默认= 0)
name_txt = models.CharField(max_length = 255,默认='')
name_class = models.CharField(max_length = 32,db_index = True,默认='')

类元:
索引= [GinIndex(字段= ['name_txt'])]

视图的 get_queryset 方法:

  class TaxonSearchListView(ListView):
#form_class = TaxonSearchForm
template_name ='collectie / taxon_list.html'
paginate_by = 20
model = NCBI_names
context_object_name ='taxon_list'

def dispatch(self,request,* args,** kwargs):
query = request.GET.get('q' )
如果查询:
尝试:
tax_id = self.model.objects.get(name_txt__iexact = query).tax_id.tax_id
return redirect('collectie:taxon_detail',tax_id )
除外(self.model.DoesNotExist,self.model.MultipleObjectsReturned)如e:
返回super(TaxonSearchListView,self).dispatch(request,* args,** kwargs)
else :
return super(TaxonSearchListView,self).dispatch(request,* args,** kwargs)

def get_queryset(self):
result = super(TaxonSearchListView,self) .get_queryset()

查询= self.request.GET.get('q')
如果查询:
结果= result.exclude(name_txt__icontains ='sp。' )
结果= result.annotate(类似ity = TrigramSimilarity('name_txt',query))。filter(similarity__gt = 0.3).order_by('-similarity')
返回结果


解决方案

灵感来自关于此主题的旧文章,我进入了当前版本,它为 GistIndex



更新:
在Django-1.11中,事情似乎更简单,因为此答案 django docs sugest:

 来自django.contrib.postgres.indexes导入GinIndex 

class MyModel(models.Model):
the_field =模式ls.CharField(max_length = 512,db_index = True)

类元:
索引= [GinIndex(fields = ['the_field'])]]

来自 Django-2.2 ,属性 opclasses 将在 class Index(fields =(),name = None,db_tablespace = none,opclasses =())






<$ p来自django.contrib.postgres.indexes的$ p> 导入GistIndex

类GistIndexTrgrmOps(GistIndex):
def create_sql(self,model,schema_editor):
#-该语句由django.db.backends.base.schema.BaseDatabaseSchemaEditor的_create_index_sql()
#方法实例化。
#使用
中的sql_create_index模板#django.db.backends.postgresql.schema.DatabaseSchemaEditor
#-模板具​​有原始值:
#创建索引%(name)s ON%(table)s%(using)s(%(columns)s%(extra)s
statement = super()。create_sql(model,schema_editor)
#-但是,我们想要使用GIST索引来加速Trigram
#匹配,因此我们要添加gist_trgm_ops索引运算符
#类
#-因此我们将模板替换为:
#创建INDEX%(name)s ON%(table)s%(using)s(%(columns)s gist_trgrm_ops)%(extra)s
statement.template = \
创建索引%( name)s ON%(table)s%(using)s(%(columns)s gist_trgm_ops)%(extra)s

返回语句

然后您可以在模型类中使用以下代码:

  class YourModel(models.Model):
some_field = m odels.TextField(...)

class Meta:
index = [
GistIndexTrgrmOps(fields = ['some_field'])
]


The new TrigramSimilarity feature of the django.contrib.postgres was great for a problem I had. I use it for a search bar to find hard to spell latin names. The problem is that there are over 2 million names, and the search takes longer then I want.

I'd like to create a index on the trigrams as descibed in the postgres documentation https://www.postgresql.org/docs/9.6/static/pgtrgm.html

But I am not sure how to do this in a way that the Django API would make use of it. For the postgres text search there is a description on how to create an index, but not for the trigram similarity. https://docs.djangoproject.com/en/1.11/ref/contrib/postgres/search/#performance

This is what I have right now:

class NCBI_names(models.Model):
    tax_id          =   models.ForeignKey(NCBI_nodes, on_delete=models.CASCADE, default = 0)
    name_txt        =   models.CharField(max_length=255, default = '')
    name_class      =   models.CharField(max_length=32, db_index=True, default = '')

    class Meta:
        indexes = [GinIndex(fields=['name_txt'])]

In the view's get_queryset method:

class TaxonSearchListView(ListView):    
    #form_class=TaxonSearchForm
    template_name='collectie/taxon_list.html'
    paginate_by=20
    model=NCBI_names
    context_object_name = 'taxon_list'

    def dispatch(self, request, *args, **kwargs):
        query = request.GET.get('q')
        if query:
            try:
                tax_id = self.model.objects.get(name_txt__iexact=query).tax_id.tax_id
                return redirect('collectie:taxon_detail', tax_id)
            except (self.model.DoesNotExist, self.model.MultipleObjectsReturned) as e:
                return super(TaxonSearchListView, self).dispatch(request, *args, **kwargs)
        else:
            return super(TaxonSearchListView, self).dispatch(request, *args, **kwargs)

    def get_queryset(self):
        result = super(TaxonSearchListView, self).get_queryset()
        #
        query = self.request.GET.get('q')
        if query:            
            result = result.exclude(name_txt__icontains = 'sp.')
            result = result.annotate(similarity=TrigramSimilarity('name_txt', query)).filter(similarity__gt=0.3).order_by('-similarity')
        return result

解决方案

Inspired from an old article on this subject, I landed to a current one which gives the following solution for a GistIndex:

Update: From Django-1.11 things seem to be simpler, as this answer and django docs sugest:

from django.contrib.postgres.indexes import GinIndex

class MyModel(models.Model):
    the_field = models.CharField(max_length=512, db_index=True)

    class Meta:
        indexes = [GinIndex(fields=['the_field'])]

From Django-2.2, an attribute opclasses will be available in class Index(fields=(), name=None, db_tablespace=None, opclasses=()) for this purpose.


from django.contrib.postgres.indexes import GistIndex

class GistIndexTrgrmOps(GistIndex):
    def create_sql(self, model, schema_editor):
        # - this Statement is instantiated by the _create_index_sql()
        #   method of django.db.backends.base.schema.BaseDatabaseSchemaEditor.
        #   using sql_create_index template from
        #   django.db.backends.postgresql.schema.DatabaseSchemaEditor
        # - the template has original value:
        #   "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s)%(extra)s"
        statement = super().create_sql(model, schema_editor)
        # - however, we want to use a GIST index to accelerate trigram
        #   matching, so we want to add the gist_trgm_ops index operator
        #   class
        # - so we replace the template with:
        #   "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s gist_trgrm_ops)%(extra)s"
        statement.template =\
            "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s gist_trgm_ops)%(extra)s"

        return statement

Which you can then use in your model class like this:

class YourModel(models.Model):
    some_field = models.TextField(...)

    class Meta:
        indexes = [
            GistIndexTrgrmOps(fields=['some_field'])
        ]

这篇关于在Django模型中使用Trigram(gin_trgm_ops)创建Gin索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆