Postgres对UTF-8字符的排序 [英] Postgres ordering of UTF-8 characters

查看:178
本文介绍了Postgres对UTF-8字符的排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个小型应用程序,该数据库在数据库中包含世界语单词,因此我有诸如ĉapelojn和brakhorloĝo之类的单词,带有特殊字符。

I'm building a small app that includes Esperanto words in my database, so I have words like ĉapelojn and brakhorloĝo, with "special" characters.

使用PostgreSQL 9.4.4我有一个具有以下模式的 words 表:

Using PostgreSQL 9.4.4 I have a words table with the following schema:

lingvoj_dev=# \d words
                                      Table "public.words"
   Column    |            Type             |                     Modifiers
-------------+-----------------------------+----------------------------------------------------
 id          | integer                     | not null default nextval('words_id_seq'::regclass)
 translated  | character varying(255)      |
 meaning     | character varying(255)      |
 times_seen  | integer                     |
 inserted_at | timestamp without time zone | not null
 updated_at  | timestamp without time zone | not null
Indexes:
    "words_pkey" PRIMARY KEY, btree (id)

但是以下查询给出了一些奇怪的输出:

But the following query gives some strange output:

lingvoj_dev=# SELECT w."translated" FROM "words" AS w ORDER BY w."translated" desc limit 10; 
translated
------------
 ĉu
 ŝi
 ĝi
 ĉevaloj
 ĉapelojn
 ĉapeloj
 ĉambro
 vostojn
 volas
 viro
(10 rows)

顺序不一致-可以将所有以特殊字符开头的单词放在结尾,但是可以,但所有以starting开头的单词都应该组合在一起,但不是!为什么ŝi和ĝi介于ĉu和ĉevaloj之间?

The ordering is inconsistent - I'd be okay with all of the words starting with special characters being at the end, but all of the words starting with ĉ should be grouped together and they're not! Why do ŝi and ĝi come in between ĉu and ĉevaloj?

服务器编码为UTF8,排序规则为en_AU.UTF-8。

The server encoding is UTF8, and the collation is en_AU.UTF-8.

edit:似乎正在对所有特殊字符进行等效排序-根据每个单词的第二个字符正确排序。如何使PostgreSQL看到ĉ,ŝ和ĝ不相等?

edit: It looks like it's sorting all of the special characters as equivalent - it's ordering correctly based on the second character in each word. How do I make PostgreSQL see that ĉ, ŝ and ĝ are not equivalent?

推荐答案


我是d可以接受以特殊字符
开头的所有单词都可以。...

I'd be okay with all of the words starting with special characters being at the end...

使用排序规则 C :

SELECT w."translated" 
FROM "words" AS w 
ORDER BY w."translated" collate "C" desc limit 10; 

另请参见 order by子句中的行为不同:Oracle与PostgreSQL

使用ORM,查询可能会出现问题。解决方案可能是使用选项LC_COLLATE = C重新创建数据库(如注释中建议的OP)。还有一个选项-更改单个列的排序规则:

The query can be problematic using an ORM. A solution could be to recreate the database with the option LC_COLLATE=C (as OP suggested in a comment). There is one more option - to change a collation for a single column:

ALTER TABLE "words" ALTER COLUMN "translated" TYPE text COLLATE "C";

这篇关于Postgres对UTF-8字符的排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆