如何改进Django中的查询性能admin相关领域搜索(MySQL) [英] How to improved query performance in Django admin search on related fields (MySQL)

查看:164
本文介绍了如何改进Django中的查询性能admin相关领域搜索(MySQL)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Django我有这样的:



models.py

  class Book(models.Model):
isbn = models.CharField(max_length = 16,db_index = True)
title = models.CharField(max_length = 255,db_index =真的)
...其他字段...

类作者(models.Model):
first_name = models.CharField(max_length = 128,db_index = True)
last_name = models.CharField(max_length = 128,db_index = True)
books = models.ManyToManyField(Book,blank = True)
...其他字段...

admin.py

  class AuthorAdmin(admin.ModelAdmin):
search_fields =('first_name','last_name','books__isbn','books__title')

...

我的问题是当我从2个或更多短期的作者管理列表页面进行搜索时,MySQL开始花费很多时间(至少8秒查询3个字)。我有约5000作者和2500书。这里的 非常重要。如果我搜索a b c,那么3个很短的条件,我没有足够的耐心等待结果(我等了至少2分钟)。相反,如果我搜索所有的蜜蜂线索,我得到的结果在2秒。所以这个问题看起来真的是相关领域的短语。



这个搜索产生的SQL查询有很多JOIN,LIKE,AND和OR,但没有子查询



我使用的是MySQL 5.1,但我尝试使用5.5,没有更多的成功。



我也试过将 innodb_buffer_pool_size 增加到一个非常大的值。没有任何改变。



我现在唯一想要提高性能的方法是将其标准化为 isbn title field(即将它们直接复制到Authors中),但是我必须添加一大堆技巧来保持这些字段与Book中的真实字段同步。



有关如何改进此查询的任何建议?

解决方案

经过大量调查发现问题来自于如何为管理搜索字段(在 ChangeList 类中)构建搜索查询。在多项搜索(由空格分隔的单词)中,每个术语通过链接新的 filter()添加到QuerySet。当 search_fields 中有一个或多个相关字段时,创建的SQL查询将有很多链接的$ code> JOIN 另一个对于每个相关字段都有很多 JOIN (请参阅我的相关问题的一些例子和更多的信息)。这个链 JOIN 是这样的,所以每个术语只能在先前的术语数据过滤子集中搜索,最重要的是,一个相关的字段只需要一个术语(vs需要有所有术语)进行匹配。请参阅跨越多重关系在Django文档中获取有关此主题的更多信息。我很确定这是管理员搜索字段大部分时间所需要的行为。



此查询的缺点(涉及相关领域)是性能(执行查询的时间)可能真的很大。这取决于很多因素:搜索字词数量,搜索字词,字段搜索类型(VARCHAR等),字段搜索的数量,表中的数据,表的大小等。使用正确的组合很容易要查询一般会永远占用(对于我来说,需要10分钟以上的查询是在这个搜索字段的上下文中永远存在的查询)。



为什么需要这么长时间的原因是数据库需要为每个术语创建一个临时表,并且大部分扫描完全是为了搜索下一个术语。所以,这很快就加起来了。



为了提高性能而做的一个可能的改变是将 $ C>过滤器()。这样他们将只有一个 JOIN 由相关字段(或2如果它是一个很多到许多),而不是更多。这个查询会更快,性能变化非常小。缺点是相关字段必须具有所有条款匹配,所以在许多情况下可以减少匹配。



更新



根据 trinchet 的要求,这里需要做更改搜索行为(对于Django 1.7)。您需要覆盖希望进行这种搜索的管理类的 get_search_results()。您需要将基类( ModelAdmin )中的所有方法代码复制到您自己的类中。然后,您需要更改以下行:

  for search_term.split():
or_queries = [models。对于orm_lookups中orm_lookup的

queryset = queryset.filter(reduce(operator.or_,or_queries))
/ pre>

为此:

  and_queries = [] 

and_queries.append(Q(reduce));
和_queries.append() (operator.or_或or_queries))
queryset = queryset.filter(reduce(operator.and_,and_queries))

此代码未经测试。我的原始代码是Django 1.4,我只是在这里适应1.7。


In Django I have this:

models.py

class Book(models.Model):
    isbn = models.CharField(max_length=16, db_index=True)
    title = models.CharField(max_length=255, db_index=True)
    ... other fields ...

class Author(models.Model):
    first_name = models.CharField(max_length=128, db_index=True)
    last_name = models.CharField(max_length=128, db_index=True)
    books = models.ManyToManyField(Book, blank=True)
    ... other fields ...

admin.py

class AuthorAdmin(admin.ModelAdmin):
    search_fields = ('first_name', 'last_name', 'books__isbn', 'books__title')

    ...

My problem is when I do a search from the Author admin list page with 2 or more short terms, MySQL start to take a lot of time (at least 8 sec. for a 3 terms query). I have around 5000 Authors and 2500 Books. The short here is very important. If I search for 'a b c', so 3 really short terms, I'm not enough patient to wait for the result (I waited at least 2 min.). Instead if I search for 'all bee clue' I got the result in 2 sec. So the problem look the be really with short terms on related fields.

The SQL query resulting from this search have a lot of JOIN, LIKE, AND and OR but no subquery.

I'm using MySQL 5.1 but I tried with 5.5 with no more success.

I also tried to increase the innodb_buffer_pool_size to a really large value. That change nothing.

The only idea I have right now to improve the performance is to denormalize to isbn and title field (ie copy them directly into Authors) but I will have to add a bunch of mechanics to keep these fields in sync with the real ones in Book.

Any suggestions on how to improve this query?

解决方案

After a lot of investigations I found that the problem come from how the search query is built for the admin search field (in the ChangeList class). In a multi-terms search (words separated by space) each term is added to the QuerySet by chaining a new filter(). When there's one or more related fields in the search_fields, the created SQL query will have a lot of JOIN chained one after the other with many JOIN for each related field (see my related question for some examples and more info). This chain of JOIN is there so that each term will be search only in the subset of data filter by the precedent term AND, most important, that a related field need to only have one term (vs needing to have ALL terms) to make a match. See Spanning multi-valued relationships in the Django docs for more info on this subject. I'm pretty sure it's the behavior wanted most of the time for the admin search field.

The drawback of this query (with related fields involved) is that the variation in performance (time to perform the query) can be really large. It depends on a lot of factors: number of searched terms, terms searched, kind of field search (VARCHAR, etc.), number of field search, data in the tables, size of the tables, etc. With the right combination it's easy to have a query that will take mostly forever (a query that take more then 10 min. for me is a query that take forever in the context of this search field).

The reason why it can take so long is that the database need to create a temporary table for each term and scan it mostly entirely to search for the next term. So, this adds up really quickly.

A possible change to do to improve the performance is to ANDed all terms in the same filter(). This way their will be only one JOIN by related field (or 2 if it's a many to many) instead of many more. This query will be a lot faster and with really small performance variation. The drawback is that related fields will have to have ALL the terms to match, so, you can get less matches in many cases.

UPDATE

As asked by trinchet here’s what’s needed to do the change of search behavior (for Django 1.7). You need to override the get_search_results() of the admin classes where you want this kind of search. You need to copy all the method code from the base class (ModelAdmin) to your own class. Then you need to change those lines:

for bit in search_term.split():
    or_queries = [models.Q(**{orm_lookup: bit})
                  for orm_lookup in orm_lookups]
    queryset = queryset.filter(reduce(operator.or_, or_queries))

To that:

and_queries = []
for bit in search_term.split():
    or_queries = [models.Q(**{orm_lookup: bit})
                  for orm_lookup in orm_lookups]
    and_queries.append(Q(reduce(operator.or_, or_queries)))
queryset = queryset.filter(reduce(operator.and_, and_queries))

This code is not tested. My original code was for Django 1.4 and I just adapt it for 1.7 here.

这篇关于如何改进Django中的查询性能admin相关领域搜索(MySQL)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆