Postgresql 9.4中带空格的不正确排序/排序规则/顺序 [英] Incorrect sort/collation/order with spaces in Postgresql 9.4

查看:75
本文介绍了Postgresql 9.4中带空格的不正确排序/排序规则/顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Postgresql 9.4.5。当我去psql并运行 \l 时,我得到

I'm using Postgresql 9.4.5. When I go to psql and run \l I get

Encoding is UTF8
Collate is en_US.UTF-8 
cCtype is en_US.UTF-8

我有个产品表,其中的名称列具有以下名称:

I have products table with a name column that has the following names:

T-700A Grouped
T-700 AGrouped
T-700A Halved
T-700 Whole

当我在pql中执行以下SQL

When I execute the following SQL in pql

SELECT name FROM products WHERE name LIKE '%T-700%' ORDER By name ASC;

我得到以下输出

T-700A Grouped
T-700 AGrouped
T-700A Halved
T-700 Whole

这种排序看起来并不自然。我期望得到

That sorting doesn't look natural. I expected to get

T-700 AGrouped
T-700 Whole
T-700A Grouped
T-700A Halved

Postgres似乎并没有像我期望的那样处理空间。谁能解释正在发生的事情并提出解决办法?

It doesn't seem like Postgres is handling spaces the way I expected. Can anyone explain what is happening and suggest a way to fix this?

推荐答案

Unix / Linux SE,一位友好的专家解释了,您看到的是对Unicode进行排序的正确方法。基本上,标准试图进行排序:

On Unix/Linux SE, a friendly expert explained that what you see is the proper way to sort Unicode. Basically, the standard is trying to sort:

di Silva Fred                  di Silva Fred
di Silva John                  diSilva Fred
diSilva Fred                   disílva Fred
diSilva John         ->        di Silva John
disílva Fred                   diSilva John
disílva John                   disílva John

现在,如果空格与字母一样重要,但不能将Fred和John的各种相同拼写分开。所以发生的事情是它首先没有空格排序。然后在第二遍中,对没有空格的相同字符串进行排序。 (这是一种简化,真正的算法看起来相当复杂,为空格,重音和不可打印的字符分配了不同的优先级。)

Now if spaces were as important as letters, the sort could not separate the various identical spellings of Fred and John. So what happens is that it first sorts without spaces. Then in a second pass, strings that are the same without whitespace are sorted. (This is a simplification, the real algorithm looks fairly complex, assigning whitespace, accents and non-printable characters various levels of precedence.)

您可以通过以下方式绕过Unicode排序规则:设置:

You can bypass the Unicode collation by setting:

export LC_ALL=C

或者在Postgres中通过转换为字节数组进行排序:

Or in Postgres by casting to byte array for sorting:

order by name::bytea

或(来自 Kiln的答案),通过指定 C 排序规则:

Or (from Kiln's answer) by specifying the C collation:

order by name collate "C"

或通过更改列的默认排序规则:

Or by altering the default collation for the column:

alter table products alter column name type text collate "C";

这篇关于Postgresql 9.4中带空格的不正确排序/排序规则/顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆