Django的性能问题不包括在内 [英] Performance issue with django exclude
问题描述
我有一个Django 1.8应用程序,并且我正在使用MsSQL数据库,并将pyodbc作为数据库后端(使用"django-pyodbc-azure"模块).
I have a Django 1.8 application, and I am using an MsSQL database, with pyodbc as the db backend (using "django-pyodbc-azure" module).
我有以下型号:
class Branch(models.Model):
name = models.CharField(max_length=30)
startTime = models.DateTimeField()
class Device(models.Model):
uid = models.CharField(max_length=100, primary_key=True)
type = models.CharField(max_length=20)
firstSeen = models.DateTimeField()
lastSeen = models.DateTimeField()
class Session(models.Model):
device = models.ForeignKey(Device)
branch = models.ForeignKey(Branch)
start = models.DateTimeField()
end = models.DateTimeField(null=True, blank=True)
我需要查询会话模型,并且要排除某些具有特定设备值的记录.因此,我发出以下查询:
I need to query the session model, and I want to exclude some records with specific device values. So I issue the following query:
sessionCount = Session.objects.filter(branch=branch)
.exclude(device__in=badDevices)
.filter(end__gte=F('start')+timedelta(minutes=30)).count()
badDevices是设备ID的预填充列表,其中包含约60个项目.
badDevices is a pre-filled list of device ids with around 60 items.
badDevices = ['id-1', 'id-2', ...]
此查询大约需要1.5秒才能完成.如果我从查询中删除排除项,则大约需要250毫秒.
This query takes around 1.5 seconds to complete. If I remove the exclude from the query, it takes around 250 miliseconds.
我为该查询集打印了生成的sql,并在数据库客户端中对其进行了尝试.在那里,两个版本都在大约250毫秒内执行.
I printed the generated sql for this queryset, and tried it in my database client. There, both versions executed in around 250 miliseconds.
这是生成的SQL:
SELECT [session].[id], [session].[device_id], [session].[branch_id], [session].[start], [session].[end]
FROM [session]
WHERE ([session].[branch_id] = my-branch-id AND
NOT ([session].[device_id] IN ('id-1', 'id-2', 'id-3',...)) AND
DATEPART(dw, [session].[start]) = 1
AND [session].[end] IS NOT NULL AND
[session].[end] >= ((DATEADD(second, 600, CAST([session].[start] AS datetime)))))
因此,在数据库级别使用排除似乎不会影响查询性能,但是在django中,如果添加排除部分,查询运行速度会慢6倍.可能是什么原因造成的?
So, using the exclude in database level doesn't seem to be affecting the query performance, but in django, the query runs 6 times slower if I add the exclude part. What could be causing this?
推荐答案
一般的问题似乎是django正在做一些额外的工作来准备exclude
子句.在该步骤之后,到生成SQL并将其发送到数据库时,在Django端没有发生任何有趣的事情,而这可能会导致如此大的延迟.
The general issue seems to be that django is doing some extra work to prepare the exclude
clause. After that step and by the time the SQL has been generated and sent to the database, there isn't anything interesting happening on the django side that could cause such a significant delay.
在您的情况下,可能导致此问题的原因是badDevices
的某种预处理.例如,如果badDevices
是QuerySet
,则django可能正在执行badDevices
查询,只是为了准备实际查询的SQL.在device
具有非默认主键的情况下,可能会发生类似的情况.
In your case, one thing that might be causing this is some kind of pre-processing of badDevices
. If, for instance, badDevices
is a QuerySet
then django might be executing the badDevices
query just to prepare the actual query's SQL. Possibly something similar might be happening in the case where device
has a non-default primary key.
另一件事可能会延迟SQL准备,当然是django-pyodbc-azure
.可能在编译查询时做一些奇怪的事情,这成为了瓶颈.
The other thing might delay the SQL preparation is of course django-pyodbc-azure
. Maybe it's doing something strange while compiling the query and it becomes a bottleneck.
尽管这都是疯狂的猜测,所以如果您仍然遇到此问题,请同时发布Device
和Branch
模型,badDevices
的确切内容以及从查询生成的SQL.然后也许至少可以消除某些情况.
This is all wild speculation though, so if you're still having this issue then post the Device
and Branch
models as well, the exact content of badDevices
and the SQL generated from the queries. Then maybe some scenarios can be at least eliminated.
我认为它必须是Device.uid
字段. django或pyodbc可能会被非默认主键所混淆,并在生成查询时获取所有设备.尝试两件事:
I think it must be the Device.uid
field. Possibly django or pyodbc is getting confused by the non-default primary key and is fetching all the devices while generating the query. Try two things:
-
将
device__in
替换为device_id__in
,device__pk__in
和device__uid__in
,然后再次进行检查.也许更明确的查询对于django来说更容易翻译成SQL.您甚至可以尝试用branch_id
替换branch
,以防万一.
Replace
device__in
withdevice_id__in
,device__pk__in
anddevice__uid__in
and check each one again. Maybe a more explicit query will be easier for django to translate into SQL. You can even try replacingbranch
withbranch_id
, just in case.
如果上述方法无效,请尝试使用原始SQL where子句替换exclude表达式:
If the above doesn't work, try replacing the exclude expression with a raw SQL where clause:
# add quotes (because of the hyphens) & join
badDevicesIdString = ", ".join(["'%s'" % id for id in badDevices])
# Replaces .exclude()
... .extra(where=['device_id NOT IN (%s)' % badDevicesIdString])
如果两种方法都不起作用,那么问题很可能出在整个查询上,而不仅仅是exclude
.在这种情况下,还有更多选择,但请先尝试上述方法,稍后如有必要,我将更新我的答案.
If neither works, then most likely the problem is with the whole query and not just exclude
. There are some more options in that case but try the above first and I will update my answer later if necessary.
这篇关于Django的性能问题不包括在内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!