SQLAlchemy-当count()表示还有更多结果时,仅返回一个结果 [英] SQLAlchemy - only one result being returned when count() says there are more

查看:377
本文介绍了SQLAlchemy-当count()表示还有更多结果时,仅返回一个结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在处理非常大的结果集时遇到麻烦,只能返回一行.

Session.query(TestSet).join(Instance).count()
>> 4283878
Session.query(TestSet).join(Instance).offset(0).limit(100).count()
>> 100
Session.query(TestSet).join(Instance).offset(0).limit(100).all()
>> [<model.testset.TestSet object at 0x043EC2F0>]

也就是说,all仅返回模型的一个实例,而不是100.现在,对于更陌生的东西:

len(Session.query(TestSet).join(Instance).offset(0).limit(100).distinct().all())
>> 100

因此,如果我在all之前添加distinct,我将获得全部100个结果.这是怎么回事?

解决方案

Query对象在被要求遍历表示诸如TestSet之类的实体的结果时,会根据对象标识对结果行进行唯一化,以便如果查询要返回100行,每行都具有相同的TestSet主键,则只会返回一个结果对象.此行为起源于Query的渴望加入"功能,在这种情况下,通常会收到许多结果行,每个结果行具有相同的主要身份,但还包含要更改的相关行的不同的次要身份填充到每个主要身份的集合中-在这种非常常见的情况下,只需要一个主要身份实例即可.

然后让我们考虑distinct()的作用.假设您对4M对象的查询返回了id = 1的1000行,id = 2的1000行等,等等.limit(100)的查询命中了id = 1的前100行,Query求唯一,得到一个结果对象返回,因为它们都是id = 1.但是使用distinct()时,突然我们得到了100个具有不同标识的行,即"id = 1","id = 2","id = 3". Query然后将这些行中的每行分配给身份映射中的新TestSet对象,您将获得100行.

Engine上临时设置echo='debug'将会显示正在发出的SQL以及返回的结果行.当看到许多结果行都具有相同的主键时,您就会知道Query当要求返回完整实体时,所有这些冗余身份将唯一化,直到为每一行表示的单个对象都是如此.

I'm having trouble with a really large result set only returning one row.

Session.query(TestSet).join(Instance).count()
>> 4283878
Session.query(TestSet).join(Instance).offset(0).limit(100).count()
>> 100
Session.query(TestSet).join(Instance).offset(0).limit(100).all()
>> [<model.testset.TestSet object at 0x043EC2F0>]

That is, all returns only one instance of my model, instead of 100. Now, for something even stranger:

len(Session.query(TestSet).join(Instance).offset(0).limit(100).distinct().all())
>> 100

So if I add distinct before all, I get back all 100 results. What's going on here?

解决方案

The Query object, when asked to iterate through results representing an entity like TestSet, performs uniquing on the result rows based on object identity, so that if the query were to return 100 rows each with the same TestSet primary key, you'd get only one result object back. This behavior has its origins in the "eager joining" feature of Query, where it's often the case that many result rows are being received each with the same primary identity, but also containing a varying secondary identity of a related row that's to be populated into a collection upon each primary identity - only one instance of the primary identity is desirable in this very common case.

Let's then consider what distinct() does. Suppose your query for 4M objects returns 1000 rows with id=1, 1000 rows with id=2, etc. The query with limit(100) hits the first 100 rows with id=1, Query uniquifies, and you get one result object back, since they are all id=1. But with distinct(), suddenly we are getting 100 rows with distinct identities, i.e. "id=1", "id=2", "id=3". Query then assigns each of these rows to a new TestSet object in the identity map, and you get 100 rows back.

Setting echo='debug' on your Engine temporarily will show the SQL being emitted as well as the result rows coming back. When you see many result rows all with the same primary key, you know that Query when asked to return full entities is going to unique all those redundant identities down to the single object represented for each row.

这篇关于SQLAlchemy-当count()表示还有更多结果时,仅返回一个结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆