postgres整理差异. osx v Ubuntu [英] postgres collation differences. osx v ubuntu

查看:98
本文介绍了postgres整理差异. osx v Ubuntu的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,最近我意识到整理对postgres来说意义重大,许多评论将OSX/语言环境支持称为损坏",这并没有启发我. 出于这个问题的目的,我忽略了排序规则的表/列默认方面,并明确指定了它.

So, i've recently come to realize that collation is a huge deal on postgres, and that many comments refer to OSX / locale support as "broken", which hasn't enlightened me. for the purposes of this question, i'm ignoring the table/column default aspects of collation, and specifying it explicitly.

  • 我的笔记本电脑是带Postgres 9.2.4的osx
  • 我的服务器是带有Postgres 9.1.9的ubuntu

这两个共同点:

  # show lc_collate ;
   en_US.UTF-8
  # show lc_ctype ;
   en_US.UTF-8

在我的笔记本电脑上:

select ',' < '-' collate "en_US.UTF-8" as result;
  true

现在,我的服务器没有排序规则"en_US.UTF-8",但确实有"en_US.utf8"(我认识到这不是同一件事,尽管我希望它的行为相同)

now, my server does not have collation "en_US.UTF-8", but it does have "en_US.utf8" (which i recognize is not the same thing, though i would expect it to behave the same)

select ',' < '-' collate "en_US.utf8" as result;
 false

所以,这就是我吓坏了. (对于两台机器而言)"C"阶总是表示,"小于-",这是我的大脑会同意的.

so, here's where i'm freaking out. "C" order would always say (for both machines) that ',' is less than '-', which my brain would agree with.

哪个utf8实现正确?如果有人可以指出我的定义会有所帮助,因为在大多数情况下,我只能找到osx的破损"指控.因此,我担心自己一生都错了,以为连字符前要用逗号定序, 但输入文本和unicode等python的合理依赖仲裁器. 在ubuntu服务器上会产生以下结果:

which utf8 implementation is correct? and if someone could point me at the definition that would help, as mostly i've only been able to find accusations of "broken" leveled at osx. So i'd be worried that i've been wrong my entire life thinking that comma orders before hyphen, but enter a reasonably reliant arbiter of text and unicode etc. python. which on the ubuntu server yields:

>>> print u',' < u'-', ',' < '-'
True True

因此,我感觉这种排序概念在ubuntu服务器上比在osx服务器上更难以解决.但是我没有从创建归类"中创建我的"en_US.UTF-8"归类的适当"归类,所以我不知道如何创建奇偶校验或正确答案(对/错)应该用作正确的参考. (除了按字母顺序排列毕竟是ASCII字符之外).

So, I'm feeling a lot like this collation concept is more broken on my ubuntu server than my osx server. but i don't have a "proper" collation to create my "en_US.UTF-8" collation from ala "create collation", so i'm lost as to how to create parity, or which answer (true/false) i should be using as the correct reference. (besides personally siding with ascii order for what are, after all, ascii characters).

因此,简而言之,这是en_US.UTF-8的正确答案?

so, in a nutshell, which is the proper answer for en_US.UTF-8 ?

推荐答案

默认Unicode排序规则元素表,您可以看到以下两个条目:

In the Default Unicode Collation Element Table you can see these two entries:

002C  ; [*0220.0020.0002] # COMMA
002D  ; [*020D.0020.0002] # HYPHEN-MINUS

在这里,COMMA的主要权重大于HYPHEN-MINUS的主要权重,因此HYPHEN-MINUS在COMMA之前进行排序.

Here, the primary weight of COMMA is greater than the primary weight of HYPHEN-MINUS, therefore HYPHEN-MINUS sorts before COMMA.

请注意,这是根据Unicode归类算法使用默认权重的预期排序顺序.如果期望按ASCII字节值排序,则得到不同的顺序.并且还有其他有效的订单.但是,如果将区域设置命名为"en_US.UTF-8"(或"en_US.utf8",同样),则您可能希望使用Unicode顺序.但这是在您和操作系统供应商之间.

Note that this is the expected sort order according to the Unicode Collation Algorithm with the default weights. If you expect sort order by ASCII byte values, you get a different order. And there are other valid orders. But if the locale is named "en_US.UTF-8" (or "en_US.utf8", same thing), then you'd probably expect Unicode order. But that's between you and your operating system vendor.

这篇关于postgres整理差异. osx v Ubuntu的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆