Django的性能问题不包括在内 [英] Performance issue with django exclude

查看:130
本文介绍了Django的性能问题不包括在内的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Django 1.8应用程序,并且我正在使用MsSQL数据库,并将pyodbc作为数据库后端(使用"django-pyodbc-azure"模块).

I have a Django 1.8 application, and I am using an MsSQL database, with pyodbc as the db backend (using "django-pyodbc-azure" module).

我有以下型号:

class Branch(models.Model):
    name = models.CharField(max_length=30)
    startTime = models.DateTimeField()

class Device(models.Model):
    uid = models.CharField(max_length=100, primary_key=True)
    type = models.CharField(max_length=20)
    firstSeen = models.DateTimeField()
    lastSeen = models.DateTimeField()

class Session(models.Model):
    device = models.ForeignKey(Device)
    branch = models.ForeignKey(Branch)
    start = models.DateTimeField()
    end = models.DateTimeField(null=True, blank=True)

我需要查询会话模型,并且要排除某些具有特定设备值的记录.因此,我发出以下查询:

I need to query the session model, and I want to exclude some records with specific device values. So I issue the following query:

sessionCount = Session.objects.filter(branch=branch)
                          .exclude(device__in=badDevices)                                             
                          .filter(end__gte=F('start')+timedelta(minutes=30)).count()

badDevices是设备ID的预填充列表,其中包含约60个项目.

badDevices is a pre-filled list of device ids with around 60 items.

badDevices = ['id-1', 'id-2', ...]

此查询大约需要1.5秒才能完成.如果我从查询中删除排除项,则大约需要250毫秒.

This query takes around 1.5 seconds to complete. If I remove the exclude from the query, it takes around 250 miliseconds.

我为该查询集打印了生成的sql,并在数据库客户端中对其进行了尝试.在那里,两个版本都在大约250毫秒内执行.

I printed the generated sql for this queryset, and tried it in my database client. There, both versions executed in around 250 miliseconds.

这是生成的SQL:

SELECT [session].[id], [session].[device_id], [session].[branch_id], [session].[start], [session].[end] 
FROM [session] 
WHERE ([session].[branch_id] = my-branch-id AND 
NOT ([session].[device_id] IN ('id-1', 'id-2', 'id-3',...)) AND 
DATEPART(dw, [session].[start]) = 1 
AND [session].[end] IS NOT NULL AND 
[session].[end] >= ((DATEADD(second, 600, CAST([session].[start] AS datetime)))))

因此,在数据库级别使用排除似乎不会影响查询性能,但是在django中,如果添加排除部分,查询运行速度会慢6倍.可能是什么原因造成的?

So, using the exclude in database level doesn't seem to be affecting the query performance, but in django, the query runs 6 times slower if I add the exclude part. What could be causing this?

推荐答案

一般的问题似乎是django正在做一些额外的工作来准备exclude子句.在该步骤之后,到生成SQL并将其发送到数据库时,在Django端没有发生任何有趣的事情,而这可能会导致如此大的延迟.

The general issue seems to be that django is doing some extra work to prepare the exclude clause. After that step and by the time the SQL has been generated and sent to the database, there isn't anything interesting happening on the django side that could cause such a significant delay.

在您的情况下,可能导致此问题的原因是badDevices的某种预处理.例如,如果badDevicesQuerySet,则django可能正在执行badDevices查询,只是为了准备实际查询的SQL.在device具有非默认主键的情况下,可能会发生类似的情况.

In your case, one thing that might be causing this is some kind of pre-processing of badDevices. If, for instance, badDevices is a QuerySet then django might be executing the badDevices query just to prepare the actual query's SQL. Possibly something similar might be happening in the case where device has a non-default primary key.

另一件事可能会延迟SQL准备,当然是django-pyodbc-azure.可能在编译查询时做一些奇怪的事情,这成为了瓶颈.

The other thing might delay the SQL preparation is of course django-pyodbc-azure. Maybe it's doing something strange while compiling the query and it becomes a bottleneck.

尽管这都是疯狂的猜测,所以如果您仍然遇到此问题,请同时发布DeviceBranch模型,badDevices的确切内容以及从查询生成的SQL.然后也许至少可以消除某些情况.

This is all wild speculation though, so if you're still having this issue then post the Device and Branch models as well, the exact content of badDevices and the SQL generated from the queries. Then maybe some scenarios can be at least eliminated.

我认为它必须是Device.uid字段. django或pyodbc可能会被非默认主键所混淆,并在生成查询时获取所有设备.尝试两件事:

I think it must be the Device.uid field. Possibly django or pyodbc is getting confused by the non-default primary key and is fetching all the devices while generating the query. Try two things:

  • device__in替换为device_id__indevice__pk__indevice__uid__in,然后再次进行检查.也许更明确的查询对于django来说更容易翻译成SQL.您甚至可以尝试用branch_id替换branch,以防万一.

  • Replace device__in with device_id__in, device__pk__in and device__uid__in and check each one again. Maybe a more explicit query will be easier for django to translate into SQL. You can even try replacing branch with branch_id, just in case.

如果上述方法无效,请尝试使用原始SQL where子句替换exclude表达式:

If the above doesn't work, try replacing the exclude expression with a raw SQL where clause:

# add quotes (because of the hyphens) & join
badDevicesIdString = ", ".join(["'%s'" % id for id in badDevices])

# Replaces .exclude()
... .extra(where=['device_id NOT IN (%s)' % badDevicesIdString])

如果两种方法都不起作用,那么问题很可能出在整个查询上,而不仅仅是exclude.在这种情况下,还有更多选择,但请先尝试上述方法,稍后如有必要,我将更新我的答案.

If neither works, then most likely the problem is with the whole query and not just exclude. There are some more options in that case but try the above first and I will update my answer later if necessary.

这篇关于Django的性能问题不包括在内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆