与PostgreSQL奇怪的排序规则 [英] Strange collation with postgresql

查看:250
本文介绍了与PostgreSQL奇怪的排序规则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到postgresql-9.5有一个奇怪的排序规则问题,因为它给Python脚本提供了不同的输出。据我了解,排序时通常是从左到右一次比较一个字符:

I noticed a strange collation issue with postgresql-9.5 as it was giving different output to a Python script. As I understand it, normally characters are compared one at a time from left to right when sorting:

select 'ab' < 'ac';
 t

select 'abX' < 'ac';
 t

因此,如果在上面的左手字符串中添加 X,则无关紧要。

So it's irrelevant if you add the 'X' to the left hand string above.

因此,当它不能用于空格和破折号之间的比较时,我感到很惊讶:

So I was surprised when this does not hold for comparison between a space and a dash:

select 'a ' < 'a-';
 t

select 'a X' < 'a-';
 f

这是一个错误还是有解决办法?我希望上面的最后一条语句也返回true。

Is it a bug or is there any way around this; I want the last statement above to also return true.

[接受@ laurenz-albe的回答后编辑]

show lc_collate;
 en_US.UTF-8

如果我在Python中按Unicode排序,我会得到类似的结果因此,这不是Postgresql中的错误,而是我认为Unicode排序规范中的错误!

If I do sorting by unicode in Python I get similar results so it's not a bug in postgresql, but a bug in the unicode collation spec in my opinion!:

>>> import locale; locale.setlocale(locale.LC_ALL, "")
'en_US.UTF-8'
>>> l = ['ac', 'ab']; sorted(l) == sorted(l, cmp=locale.strcoll)
True
>>> l = ['ac', 'abX']; sorted(l) == sorted(l, cmp=locale.strcoll)
True
>>> l = ['ac', 'abX']; sorted(l) == sorted(l, cmp=locale.strcoll)
True
>>> l = ['a-', 'a X']; sorted(l) == sorted(l, cmp=locale.strcoll)
False
>>> sorted(l),  sorted(l, cmp=locale.strcoll)
(['a X', 'a-'], ['a-', 'a X'])


推荐答案

使用归类比较字符。您可以使用SQL命令

Characters are compared using a collation. You can use the SQL command

SHOW lc_collate;

看看您的默认排序规则是什么。

to see what your default collation is.

PostgreSQL使用操作系统的归类,因此比较的结果通常取决于操作系统。

PostgreSQL uses the operating system's collations, so the result of a comparison will often depend on the operating system.

要强制按字节ASCII比较,请使用C归类

To force byte-wise ASCII comparison, use the C collation

test=> SELECT 'a X' COLLATE "C" < 'a-';
 ?column?
----------
 t
(1 row)

或按字节比较的运算符

test=> SELECT 'a X' ~<~ 'a-';
 ?column?
----------
 t
(1 row)

这篇关于与PostgreSQL奇怪的排序规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆