Django QuerySet与Raw Query性能 [英] Django QuerySet vs Raw Query performance

查看:370
本文介绍了Django QuerySet与Raw Query性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到使用django connection.cursor与使用模型界面(即使使用小型查询)也有很大的时间差异。
我已经使模型接口尽可能高效,values_list所以没有对象被构造等等。以下是测试的两个功能,不要介意西班牙语的名字。

I have noticed a huge timing difference between using django connection.cursor vs using the model interface, even with small querysets. I have made the model interface as efficient as possible, with values_list so no objects are constructed and such. Below are the two functions tested, don't mind the spanish names.

def t3():
    q = "select id, numerosDisponibles FROM samibackend_eventoagendado LIMIT 1000"
    with connection.cursor() as c:
        c.execute(q)
        return list(c)

def t4():
    return list(EventoAgendado.objects.all().values_list('id','numerosDisponibles')[:1000])

然后使用一个函数来计时(用time.clock()自制)

Then using a function to time (self made with time.clock())

r1 = timeme(t3); r2 = timeme(t4)

结果如下:
0.00180384529631和t3493390727024和t4

The results are as follows: 0.00180384529631 and 0.00493390727024 for t3 and t4

只是为了确保查询是一样的执行:

And just to make sure the queries are and take the same to execute:

connection.queries[-2::]

收益:

[
    {u'sql': u'select id, numerosDisponibles FROM samibackend_eventoagendado LIMIT 1000',  u'time': u'0.002'},
    {u'sql': u'SELECT `samiBackend_eventoagendado`.`id`, `samiBackend_eventoagendado`.`numerosDisponibles` FROM `samiBackend_eventoagendado` LIMIT 1000', u'time': u'0.002'}
]

如您所见,两个确切的查询返回两个确切的列表(执行r1 == r2返回True),具有完全不同的时间(差异越大,查询集越大),我知道python很慢,但是django在幕后做了很多工作,使查询更慢?
另外,为了确保,我已经尝试建立了queryset对象,首先(定时器外),但结果是一样的,所以我100%肯定额外的时间来自于获取和构建结果结构。
我也尝试在查询结尾使用iterator()函数,但是这两者都不会有帮助。
我知道差异是微乎其微的,两个执行速度都很快,但是这是使用apache ab的基准,而当有1k并发请求时,这个最小的差异使得日常和轻松。

As you can see, two exact queries, returning two exact lists (performing r1 == r2 returns True), takes totally different timings (difference gets bigger with a bigger query set), I know python is slow, but is django doing so much work behind the scenes to make the query that slower? Also, just to make sure, I have tried building the queryset object first (outside the timer) but results are the same, so I'm 100% sure the extra time comes from fetching and building the result structure. I have also tried using the iterator() function at the end of the query but that doesn't help neither. I know the difference is minimal, both execute blazingly fast, but this is being bencharked with apache ab, and this minimal difference, when having 1k concurrent requests, makes day and light.

顺便说一句,我使用django 1.7.10与mysqlclient作为db连接器。

By the way, I'm using django 1.7.10 with mysqlclient as the db connector.

编辑:为了比较,相同测试与11k结果查询集,差异变得更大(3倍慢,相比之下,其中约2.6倍慢)

For the sake of comparison, the same test with a 11k result query set, the difference gets even bigger (3x slower, compared to the first one where it is around 2.6x slower)

r1 = timeme(t3); r2 = timeme(t4)
0.0149241530889
0.0437563529558

EDIT2:另一个有趣的测试如果我实际上将查询对象转换为实际的字符串查询(使用str(queryset.query)),并且在原始查询中使用它,那么与使用查询的execpt相比,我获得与原始查询相同的良好性能.query字符串有时会给我一个实际的无效的SQL查询(即如果查询器在日期值上有一个过滤器,则日期值不会在字符串查询中用'进行转义,在使用原始查询执行时会发生sql错误,这是另一个谜题)

Another funny test, if I actually convert the queryset object to it's actual string query (with str(queryset.query)), and use it inside a raw query instead, I get the same good performance as the raw query, by the execption that using the queryset.query string sometimes gives me an actual invalid SQL query (ie, if the queryset has a filter on a date value, the date value is not escaped with '' on the string query, giving an sql error when executing it with a raw query, this is another mystery)

- EDIT3:
通过代码,似乎差异是通过如何检索结果数据,原始查询集,它简单地调用 iter(self.cursor),我相信当使用C实现的连接器将运行所有的C代码(由于iter也是内置的)而ValuesListQuerySet实际上是一个具有yield tuple(row)状态的循环的python级别这将是相当缓慢的。我想在这件事上没有什么可做的,就像原始查询集具有相同的表现:'(。
如果有人有兴趣,那么这个慢循环就是这样的:

-- Going through the code, seems like the difference is made by how the result data is retrieved, for a raw query set, it simply calls iter(self.cursor) which I believe when using a C implemented connector will run all in C code (as iter is also a built in), while the ValuesListQuerySet is actually a python level for loop with a yield tuple(row) statement, which will be quite slow. I guess there's nothing to be done in this matter to have the same performance as the raw query set :'(. If anyone is interested, the slow loop is this one:

for row in self.query.get_compiler(self.db).results_iter():
    yield tuple(row)

- 编辑4:我带来了一个非常奇怪的代码,将值列表查询集转换为可用数据要发送到一个原始查询,具有与运行原始查询相同的性能,我猜这是非常糟糕的,只能使用mysql,但是加速非常好,同时允许我保持模型api过滤等你觉得怎么样?
这是代码。

-- EDIT 4: I have come with a very hacky code to convert a values list query set into usable data to be sent to a raw query, having the same performance as running a raw query, I guess this is very bad and will only work with mysql, but, the speed up is very nice while allowing me to keep the model api filtering and such. What do you think? Here's the code.

def querysetAsRaw(qs):
    q = qs.query.get_compiler(qs.db).as_sql()
    with connection.cursor() as c:
        c.execute(q[0], q[1])
        return c


推荐答案

答案很简单,更新到djan去1.8或更高版本,改变了一些在性能上不再有这个问题的代码。

The answer was simple, update to django 1.8 or above, which changed some code that no longer has this issue in performance.

这篇关于Django QuerySet与Raw Query性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆