Django干草堆领域的独特价值 [英] Django Haystack Distinct Value for Field

查看:132
本文介绍了Django干草堆领域的独特价值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Django Haystack + Elasticsearch + Django REST框架构建一个小型的搜索引擎,我正在试图找出重现Django QuerySet 的行为



我的索引看起来像这样:



pre> class ItemIndex(indexes.SearchIndex,indexes.Indexable):
text = indexes.CharField(document = True,use_template = True)
item_id = indexes。 IntegerField(faceted = True)

def prepare_item_id(self,obj):
return obj.item_id

我想要做的是以下内容:

  sqs = SearchQuerySet ().filter(content = my_search_query).distinct('item_id')

然而,Haystack的 SearchQuerySet 没有一个不同的方法,所以我很失落。我尝试使用该字段,然后使用返回的列表 item_id 查询Django,但这会失去弹性搜索的性能,也使得不可能使用Elasticsearch的排序功能



任何想法?



编辑:



示例数据:



示例数据:

 项目模型
==========

id title
1'项目1'
2'项目2'
3'项目3'


VendorItem模型<<有问题的表
================

id item_id vendor_id lat lon
1 1 1 38 -122
2 2 1 38.2 -121.8
3 3 2 37.9 -121.9
4 1 2 ... ...
5 2 2 ... ...
6 2 3 ... ...

可以看到,同一个Item有多个VendorItem,但是当搜索时,我只想为每个项目最多检索一个结果。因此,我需要 item_id 列是唯一的/不同的。



我已尝试在 item_id 列,然后执行以下查询:

  facets = SearchQuerySet()。filter (content = query).facet('item_id')
计数= sqs.facet_counts()

#ids将如下所示:[345,892,123,34,...]
ids = [i [0] for i in counting ['fields'] ['item_id']]

items = VendorItem.objects.filter(vendor__lat__gte = latMin,
vendor_lon__gte = lonMin,vendor__lat__lte = latMax,
vendor__lon__lte = lonMax,item_id__in = ids).distinct(
'item')。select_related('vendor','item')

这里的主要问题是结果限制为100个项目,并且不能使用haystack进行排序。

解决方案

我认为最好的建议是停止使用干草堆。



干草堆的默认后端(TH e elasticsearch_backend.py)主要是用Solr写的。在haystack中找到很多麻烦,但最大的是将所有查询都打包成一个名为query_string的东西。使用查询字符串,他们可以使用lucene语法,但也意味着丢失整个弹性搜索DSL。 lucene语法有一些优点,特别是如果这是你以前习惯的,但是从弹性搜索的角度来看是非常有限的。



此外,我认为你是将RDBMS概念应用于搜索引擎。这不是说你不应该得到你需要的结果,但是这种方法往往是不同的。



你可以查询和检索这些数据的方法可能是不同的,如果你不使用干草堆,因为干草堆以一种更适合solr而不是弹性搜索的方式创建索引。



例如,在创建一个新索引时,haystack将分配一个名为modelresult的类型将转换为索引中的所有模型。



所以,我们假设你有一些名为Items的实体和一些名为vendoritem的实体。



将它们都放在同一个索引中,但是将vendoritem作为一种供应商类型和具有某种类型的项目的项目可能是合适的。



当查询时,您将根据其余的端点进行查询,如 localhost:9200 / index / type(query)。 haystack实现的方式是通过django内容类型模块。因此,有一个名为django_ct的字段,即在您只查找唯一项目时,可以查询并附加到您可能会做的任何查询。



为了说明上述内容:



此端点搜索所有索引

 `localhost:9200 / `

此端点搜索索引中的所有类型:

 `localhost:9200 / yourindex /`

此端点搜索索引中的类型:

 `localhost:9200 / yourindex / yourtype /`

,此端点在索引中搜索两种指定的类型:

 `localhost:9200 / yourindex / yourtype,yourothertype /`

回到干草堆,虽然可以通过在查询中添加一个django_ct来获得唯一的值,但可能不是你想要的。



你真正想要的做是一个方面,也许你想使用术语方面。这可能是干草堆中的一个问题,因为它A.)分析所有文本和B.)应用store = True到所有字段(真的不是你想在弹性搜索中做的事情,但是你经常想要在solr中做的) p>

您可以在弹性搜索( http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-facet.html#_ordering



我并不是为了在大海捞针上砰的一声。我认为它在概念上做了很多事情。特别好的是,如果你需要做的是索引一个单一的模型(比如说一个博客),只是快速返回结果。



那么说,我强烈建议请使用 elasticutils 。 haystack的一些概念是类似的,但是它使用的是搜索dsl,而不是query_string(但是如果需要,你仍然可以使用query_string)。



我不认为你可以默认使用elasticutils来订购方面,但是你可以通过一个python字典你想要的东西你想要 facet_raw 方法(我不你最后一个选择是创建你自己的haystack后端,从现有的后端继承,然后添加一些功能到.facet()方法允许根据上述dsl进行排序。


I am building a small search engine using Django Haystack + Elasticsearch + Django REST Framework, and I'm trying to figure out reproduce the behavior of a Django QuerySet's distinct method.

My index looks something like this:

class ItemIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)
    item_id = indexes.IntegerField(faceted=True)

    def prepare_item_id(self, obj):
        return obj.item_id

What I'd like to be able to do is the following:

sqs = SearchQuerySet().filter(content=my_search_query).distinct('item_id')

However, Haystack's SearchQuerySet doesn't have a distinct method, so I'm kind of lost. I tried faceting the field, and then querying Django using the returned list of item_id's, but this loses the performance of Elasticsearch, and also makes it impossible to use Elasticsearch's sorting features.

Any thoughts?

EDIT:

Example data:

Example data:

Item Model
==========

id  title
1   'Item 1'
2   'Item 2'
3   'Item 3'


VendorItem Model << the table in question
================

id  item_id  vendor_id  lat   lon
1   1        1          38    -122
2   2        1          38.2  -121.8
3   3        2          37.9  -121.9
4   1        2          ...   ...
5   2        2          ...   ...
6   2        3          ...   ...

As you can see, there are multiple VendorItem's for the same Item, however when searching I only want to retrieve at most one result for each item. Therefore I need the item_id column to be unique/distinct.

I have tried faceting on the item_id column, and then executing the following query:

facets = SearchQuerySet().filter(content=query).facet('item_id')
counts = sqs.facet_counts()

# ids will look like: [345, 892, 123, 34,...]
ids = [i[0] for i in counts['fields']['item_id']]

items = VendorItem.objects.filter(vendor__lat__gte=latMin,
    vendor__lon__gte=lonMin, vendor__lat__lte=latMax,
    vendor__lon__lte=lonMax, item_id__in=ids).distinct(
        'item').select_related('vendor', 'item')

The main problem here is that results are limited to 100 items, and they cannot be sorted with haystack.

解决方案

I think the best advice I can give you is to stop using Haystack.

Haystack's default backend (the elasticsearch_backend.py) is mostly written with Solr in mind. There are a lot of annoyances that I find in haystack, but the biggest has to be that it packs all queries into something called query_string. Using query string, they can use the lucene syntax, but it also means losing the entire elasticsearch DSL. The lucene syntax has some advantages, especially if this is what you are used to, but it is very limiting from an elasticsearch point of view.

Furthermore, I think you are applying an RDBMS concept to a search engine. That isn't to say that you shouldn't get the results you need, but the approach is often different.

The way you might query and retrieve this data might be different if you don't use haystack because haystack creates indexes in a way more appropriate for solr than for elasticsearch.

For example, in creating a new index, haystack will assign a "type" called "modelresult" to all models that will go in an index.

So, let's say you have some entities called Items and some other entities called vendoritems.

It might be appropriate to have them both in the same index but with vendoritems as a type of vendoritems and items having a type of items.

When querying, you would then query based on the rest endpoint so, something like localhost:9200/index/type (query). The way haystack achieves is this is through the django content types module. Accordingly, there is a field called "django_ct" that haystack queries and attaches to any query you might make when you are only looking for unique items.

To illustrate the above:

This endpoint searches accross all indexes

`localhost:9200/`

This endpoint searches across all types in an index:

`localhost:9200/yourindex/`

This endpoint searches in a type within an index:

`localhost:9200/yourindex/yourtype/`

and this endpoint searches in two specified types within an index:

`localhost:9200/yourindex/yourtype,yourothertype/`

Back to haystack though, you can possibly get unique values by adding a django_ct to your query, but likely that isn't what you want.

What you really want to do is a facet, and probably you want to use term facets. This could be a problem in haystack because it A.) analyzes all text and B.) applies store=True to all fields (really not something you want to do in elasticsearch, but something you often want to do in solr).

You can order facet results in elasticsearch (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-facet.html#_ordering)

I don't mean for this to be a slam on haystack. I think it does a lot of things right conceptually. It's especially good if all you need to do is index a single model (like say a blog) and just have it quickly return results.

That said, I highly recommend to use elasticutils. Some of the concepts from haystack are similar, but it uses the search dsl, rather than query_string (but you can still use query_string if you wanted).

Be warned though, I don't think you can order facets using elasticutils by default, but you can just pass in a python dictionary of the facets you want to facet_raw method (something I don't think you can do in haystack).

Your last option is to create your own haystack backend, inherit from the existing backend and just add some functionality to the .facet() method to allow for ordering per the above dsl.

这篇关于Django干草堆领域的独特价值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆