PostgreSQL:ORDER BY和LIMIT / OFFSET发生奇怪的冲突 [英] PostgreSQL: strange collision of ORDER BY and LIMIT/OFFSET

查看:422
本文介绍了PostgreSQL:ORDER BY和LIMIT / OFFSET发生奇怪的冲突的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试在PostgreSQL 9.1中做到这一点:

I'm trying to do this in PostgreSQL 9.1:

SELECT m.id, vm.id, vm.value
FROM m
LEFT JOIN vm ON vm.m_id = m.id and vm.variation_id = 1
ORDER BY lower(trim(vm.value)) COLLATE "C" ASC LIMIT 10 OFFSET 120

结果是:

 id |  id | value
----+-----+---------------
504 | 511 | "andr-223322"
506 | 513 | "andr-322223"
824 | 831 | "angHybrid"
866 | 873 | "Another thing"
493 | 500 | "App update required!"
837 | 844 | "App update required!"
471 | 478 | "April"
905 | 912 | "Are you sure you want to delete this thing?"
 25 |  29 | "Assignment"
196 | 201 | "AT ADDRESS"

好,让我们用 OFFSET 130 <执行相同的查询/ code>:

Ok, let's execute the same query with OFFSET 130:

 id |  id | value
----+-----+---------------
196 | 201 | "AT ADDRESS"
256 | 261 | "Att Angle"
190 | 195 | "Att Angle"
273 | 278 | "Att Angle:"
830 | 837 | "attAngle"
475 | 482 | "August"
710 | 717 | "Averages"
411 | 416 | "AVG"
692 | 699 | "AVG SHAPE"
410 | 415 | "AVGs"

,我们看到我们的 AT地址

and we see our AT ADDRESS item again, but at the beginning!!!

事实是 vm 表包含以下两个项目:

The fact is that the vm table contains two following items:

 id | m_id | value
----+------+---------------
201 |  196 | "AT ADDRESS"
599 |  592 | "At Address"

我可以通过以下方法解决这种情况:

I cure this situation with a workaround:

(lower(trim(vm.value)) || vm.id)

但是什么地狱??? !!!
为什么必须使用替代方法?

but What The Hell ???!!! Why do I have to use a workaround?

推荐答案

宣誓就职不会更改定义此行为的SQL标准。

除非在 ORDER BY 中指定,否则行顺序为未定义。 每个文档

Swearing won't change the SQL standard that defines this behaviour.
The order of rows is undefined unless specified in ORDER BY. Per documentation:


如果未选择排序,则将以未指定的
顺序返回行。在这种情况下,实际顺序将取决于扫描和加入
计划类型以及磁盘上的顺序,但不能依赖它。只有明确选择了排序步骤,才能保证
的特定输出顺序。

If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on. A particular output ordering can only be guaranteed if the sort step is explicitly chosen.

由于您未定义这两个同伴的排序(按您的排序顺序):

Since you didn't define an order for these two peers (in your sort order):

 id | m_id | value
----+------+---------------
201 |  196 | "AT ADDRESS"
599 |  592 | "At Address"

..您可以随意订购-方便Postgres使用。带有 LIMIT 的查询通常使用不同的查询计划,这可以解释不同的结果。

.. you get arbitrary ordering - whatever is convenient for Postgres. A query with LIMIT often uses a different query plan, which can explain different results.

修复:

ORDER BY lower(trim(vm.value)) COLLATE "C", vm.id;

或者(可能更有意义-也可以调整到现有索引):

Or (maybe more meaningful - possibly also tune to existing indexes):

ORDER BY lower(trim(vm.value)) COLLATE "C", vm.value, vm.id;

(这与在这里使用 COLLATE C 无关,btw。)

请勿连接,这会花费更多,并且可能无法使用索引(除非您拥有该精确表达式的索引)。添加另一个表达式,当 ORDER BY 列表中的先前表达式离开歧义时开始。

(This is unrelated to the use of COLLATE "C" here, btw.)
Don't concatenate for this purpose, that's much more expensive and potentially makes it impossible to use an index (unless you have an index on that precise expression). Add another expression that kicks in when prior expressions in the ORDER BY list leave ambiguity.

此外,因为您有 左联接 在那里, m 中的行在中不匹配vm 的所有当前 ORDER BY 表达式均具有空值。它们排在最后,然后以其他方式任意排序。如果您总体上想要一个稳定的排序顺序,则也需要处理。像这样:

Also, since you have a LEFT JOIN there, rows in m without match in vm have null values for all current ORDER BY expressions. They come last and are sorted arbitrarily otherwise. If you want a stable sort order overall, you need to deal with that, too. Like:

ORDER BY lower(trim(vm.value)) COLLATE "C", vm.id, m.id;



另外



为什么要存储双引号?似乎是昂贵的噪音。没有他们,您可能会更好。

Asides

Why store the double quotes? Seems to be costly noise. You might be better off without them. You can always add the quotes on output if need be.

许多客户端不能在一个结果中多次处理相同的列名。您至少需要为 id 列之一使用列别名: SELECT m.id AS m_id,vm.id AS vm_id ... 。旨在说明为什么列的 id是开头的反模式。

Many clients cannot deal with the same column name multiple times in one result. You need a column alias for at least one of your id columns: SELECT m.id AS m_id, vm.id AS vm_id .... Goes to show why "id" for a column is an anti-pattern to begin with.

这篇关于PostgreSQL:ORDER BY和LIMIT / OFFSET发生奇怪的冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆