Django:为什么Foo.objects.extra(...)比Foo.objects.raw快得多? [英] Django: Why is Foo.objects.extra(...) So Much Faster Than Foo.objects.raw?

查看:46
本文介绍了Django:为什么Foo.objects.extra(...)比Foo.objects.raw快得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我正在尝试优化一个相当奇怪的查询,但这是一个旧数据库,因此我会尽力而为.这些是我正在尝试的查询.此时,它们提供相同的输出.w是我的查询集.

So I am trying to optimize a fairly odd query, but this is a legacy database so I make do with what I have. These are the queries I am trying. They provide the same output at this point. w is my queryset.

def future_schedule(request):

    past = datetime.date.today()-datetime.timedelta(days=730)

    extra_select = {
        'addlcomplete': 'SELECT Complete FROM tblAdditionalDates WHERE Checkin.ShortSampleID = tblAdditionalDates.ShortSampleID',
        'addldate': 'SELECT AddlDate FROM tblAdditionalDates WHERE Checkin.ShortSampleID = tblAdditionalDates.ShortSampleID'
    }
    extra_where = ['''(Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > %s AND Checkin.DateCompleted IS NULL AND Checkin.Canceled = 0) OR (Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > %s AND Checkin.DateCompleted IS NOT NULL AND Checkin.DateFinalCompleted IS NULL AND Checkin.DateFinalExpected IS NOT NULL AND Checkin.Canceled = 0) '''
    ]
    extra_params = [past, past]

    w = Checkin.objects.extra(select=extra_select, where=extra_where, params=extra_params)

# OR This one

    w = Checkin.objects.raw('''SELECT Checkin.SampleID, Checkin.ShortSampleID, Checkin.Company, A.Complete, Checkin.HasDates, A.AddlDate FROM Checkin LEFT JOIN (SELECT ShortSampleID, Complete, AddlDate FROM tblAdditionalDates) A ON A.ShortSampleID = Checkin.ShortSampleID WHERE (Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > "2009-01-01" AND Checkin.DateCompleted IS NULL AND Checkin.Canceled = 0) OR (Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > "2009-01-01" AND Checkin.DateCompleted IS NOT NULL AND Checkin.DateFinalCompleted IS NULL AND Checkin.DateFinalExpected IS NOT NULL AND Checkin.Canceled = 0)''')

这两个都返回相同数量的记录(322)..extra呈现HTML的速度比.raw查询快10秒钟左右.出于所有密集的目的,.raw查询的复杂程度甚至略低.是否有人对此有任何见解?根据我的结构,.raw可能是获取所需数据的唯一方法(我需要在extra_select dict中使用addlcomplete和addldate,并在Haveing子句中使用它们来进一步过滤查询集),但我当然不喜欢如何这需要很长时间.是在模板层上变慢还是在实际查询层上?我怎样才能最好地调试它?

Both of these return the same number of records (322). .extra is about 10 seconds faster in rendering the HTML than the .raw query and for all intensive purposes, the .raw query is mildly less complex even. Does anyone have any insight as to why this might be? Based on my structure, .raw may be the only way I get the data I need (I need the addlcomplete and addldate in the extra_select dict and use them in a Having clause to further filter the queryset) but I certainly don't like how long it is taking. Is it on the template layer that it is slower or the actual query layer? How can I best debug this?

感谢您在数据结构不良的情况下寻求优化的帮助.

Thank for your help in this quest for optimization amidst poor data structures.

更新1:2011-10-03

因此,我安装了django-debugtoolbar进行了窥探,并启用了MySQL常规日志记录,并提出了以下建议:

So I installed django-debugtoolbar to snoop around a bit and I eneabled MySQL general logging and came up with the following:

使用 .filter() .extra()的查询总数为2.使用 .raw()的查询总数为 1984 !!! (不忽略怪异的文学参考)

using .filter() or .extra() Total Query count is 2. Using .raw() Total Query count is 1984!!! (Spooky literary reference not ignored)

我的模板正在使用重组,然后在该重组中循环.不遵循任何关系,除内置函数外不使用任何模板标签.Select_related没有被使用,我仍然只得到2个查询.看看mysql日志,肯定可以-1984年查询.

My template is using a regroup and then looping through that regroup. No relations are being followed, no template tags other than builtins are being used. Select_related is NOT being used and I still only get the 2 queries. Looking at the mysql log, sure enough - 1984 queries.

当查看执行的查询时,基本上对于每个 {{Modelinstance.field}} 来说,django都在执行 SELECT pk,字段FROM Model WHERE Model.pk =Modelinstance.pk 如果您问我,这似乎是完全错误的.我在这里错过了什么吗,还是django真的在查询中疯狂奔跑?

When looking at the queries that were executed, basically it looks like for every {{ Modelinstance.field }} django was doing a SELECT pk, field FROM Model WHERE Model.pk = Modelinstance.pk This seems completely wrong if you ask me. Am I missing something here or is django really running wild with queries?

END UPDATE 1

更新2 请参阅下面的答案

格雷格

推荐答案

好.这是我的最终结论.尽管Furbeenator对内部Django优化是正确的,但事实证明,存在一个更大的用户错误,它导致速度下降以及上述数千个查询.

Ok. Here are my final conclusions. While Furbeenator is correct about the internal Django optimizations, turns out there is a much larger, user error that caused the slowdown and the aforementioned thousands of queries.

原始查询集文档中清楚地记录了该文档表示当您推迟字段(即不使用 SELECT * FROM ... )并仅选择特定字段时( SELECT Checkin.Sampleid,... 仍然可以通过另一个数据库调用来访问未选择的字段.因此,如果您在原始查询中选择字段的子集,而又忘记了要在模板中使用的查询中的字段,则Django将执行数据库查找以找到您要在模板中引用的字段,而不是抱怨它不存在或其他问题.因此,假设您从查询中遗漏了5个字段(我所做的),最终您在模板中引用了该字段,您有300条要遍历的记录.这将额外引起1500次数据库匹配,从而为每条记录获取这5个字段.

It is clearly documented in the Raw queryset docs that when you defer fields (i.e. not using SELECT * FROM ...) and are selecting only certain fields specifically (SELECT Checkin.Sampleid, ... the fields that you don't select can still be accessed but with another database call. So, if you are selecting a subset of fields in your raw query and you forgot a field in your query that you use in your template, Django performs a database lookup to find that field you are referencing in your template rather than complaining about it not existing or whatever. So, let's say you leave out 5 fields from your query (which is what I did) that you end up referencing in your template and you have 300 records that you are looping through. This incurs 1500 extra database hits to get those 5 fields for each record.

因此,请注意隐藏的引用,并感谢上帝为 Django Debug Toolbar

So, beware of hidden references and thank god for Django Debug Toolbar

这篇关于Django:为什么Foo.objects.extra(...)比Foo.objects.raw快得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆