将Trigram与Django 1.10中的排名搜索结合 [英] Combine trigram with ranked searching in django 1.10

查看:72
本文介绍了将Trigram与Django 1.10中的排名搜索结合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在django 1.10中进行搜索,我们需要通过三字母组合搜索对用户进行排名搜索。

We working with searching in django 1.10 and we need user ranked searching with trigram searching.

我们的代码是这样的:

def get_queryset(self):
        search = self.request.GET.get('text', '')
        vector = SearchVector('name',weight='A',
            config=settings.SEARCH_LANGS[
                settings.LANGUAGE
            ],
            ) + SearchVector(
            'content',
            weight='B',
            config=settings.SEARCH_LANGS[
                settings.LANGUAGE
            ],
            )
        query = SearchQuery(search)
        return Article.objects.annotate(
            rank=SearchRank(
                vector,
                query
                ),
            similarity=TrigramSimilarity(
                'name', search
                ) + TrigramSimilarity(
                'content', search
                ),
            ).filter(
            rank__gte=0.3
            ).filter(
            similarity__gt=0.3
            ).order_by(
            '-similarity'
            )[:20]

但是这段代码不会返回任何查询,如果不使用trigram,我们避风港问题,但是,在它们之间结合起来就无法查询。

But this code doesn't return any query, without use trigram we haven problems, but, combined between they we can't get a query.

如何在Django 1.10中结合三字母组和排名搜索?

How can we combine trigram and ranked searching in django 1.10?

推荐答案

我们调查了更彻底的了解搜索如何权重。

We investigated more thoroughly understood how search works weights.

根据文档您可以根据字段分配权重,它们甚至可以分配权重,类似地,我们可以使用字母来过滤

According to documents you can be assigned weights according to the fields and they can even be assigned weights, and similarly we can use trigrams to filter by similarity or distance.

但是,没有给出使用两者并进一步研究它对权重起作用的理解或示例的例子。

However not specify an example of using the two and investigating further it understood nor much as weights work.

一个小小的逻辑告诉我们,如果我们寻求一个共同的词,我们都将排在第0位,那么相似度的变化远大于范围,但是趋于降低该范围的值。

A little logic tells us that if we seek a common word in all we will all ranks 0, similarity varies much more than ranges, however tends to lower values ​​that range.

现在,据我们所知,文本搜索是基于要过滤的字段中包含的文本而不是配置中使用的语言进行的。例如,放置标题,使用的模型有一个标题字段和一个内容字段,其最常见的词是如何更改,查看加权词(范围用作查询,所以我们可以使用 values values_list 来查看排名和相似性,它们是数字值,我们可以查看加权单词查看矢量对象),我们看到,如果分配了权重,但使用拆分单词的组合:找到了 perfil和 cambi,但是没有找到 cambiar或 como;但是,所有模型都包含与 lorem ipsun ...相同的文本,并且该句子中的所有单词(如果它们是完整的且权重为B);我们以此得出结论,搜索是根据字段的内容完成的,以过滤出比我们配置搜索所用的语言更多的语言。

Now, text search, as far as we understand, it is carried out based on the text contained in the fields you want to filter even more than in the language that is placed in the configuration. Example is that putting titles, the used model had a title field and a content field, whose most common words were how change, reviewing weighted words (ranges function as query, so we can use values ​​or values_list to review the ranks and similarities, which are numerical values, we can view weighted words viewing vector object), we saw that if weights were allocated, but combinations of splitted words: found 'perfil' and 'cambi', however we did not find 'cambiar' or 'como'; however, all models had contained the same text as 'lorem ipsun ...', and all the words of that sentence if they were whole and with weights B; We conclude with this that the searches are done based on the contents of the fields to filter more than the language with which we configure searches.

也就是说,这里我们介绍

That said, here we present the code we use for everything.

首先,我们需要使用Trigrams启用数据库所需的程度:

First, we need to use Trigrams the extent necessary to enable the database:

from __future__ import unicode_literals

from django.db import migrations, models
import django.db.models.deletion
from django.contrib.postgres.operations import UnaccentExtension
from django.contrib.postgres.operations import TrigramExtension

class Migration(migrations.Migration):

    initial = True

    dependencies = [
    ]

    operations = [
      ...
      TrigramExtension(),
      UnaccentExtension(),

    ]

postgres迁移的导入操作包并从任何文件迁移运行

Import operations for migration from postgres packages and run from any file migration .

下一步是更改问题代码,以使过滤器在第二个查询失败时返回其中一个查询:

The next step is to change the code of the question so that the filter returns one of the querys if the second fails:

def get_queryset(self):
        search_query = SearchQuery(self.request.GET.get('q', ''))

        vector = SearchVector(
            'name',
            weight='A',
            config=settings.SEARCH_LANGS[settings.LANGUAGE_CODE],
        ) + SearchVector(
            'content',
            weight='B',
            config=settings.SEARCH_LANGS[settings.LANGUAGE_CODE],
        )

        if self.request.user.is_authenticated:
            queryset = Article.actives.all()
        else:
            queryset = Article.publics.all()

        return queryset.annotate(
          rank=SearchRank(vector, search_query)
          similarity=TrigramSimilarity(
              'name', search_query
            ) + TrigramSimilarity(
              'content', search_query
            ),
        ).filter(Q(rank__gte=0.3) | Q(similarity__gt=0.3)).order_by('-rank')[:20]

问题上面的代码在另一个查询中出现一个查询,如果两个查询中的任何一个都没有出现所选的单词,则问题就更大了。我们使用 Q 对象使用 OR 连接器进行过滤,这样,如果两者之一不返回期望值

The problem with the above code was seeping one query after another, and if the word chosen not appear in any of the two searches the problem is greater . We use a Q object to filter using an OR connector so that if one of the two does not return a desired value , send the other in place.

就够了,但是欢迎他们澄清这些砝码和Trigramas的工作原理,以充分利用这一新优势的最新版本的Django。

With this is enough, however they are welcome clarifications depth on how these weights and trigramas work, to explitar the most of this new advantage offered by the latest version of Django.

这篇关于将Trigram与Django 1.10中的排名搜索结合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆