如何在Perl或任何其他编程语言中对CJK(亚洲)字符进行排序? [英] How do you sort CJK (Asian) characters in Perl, or with any other programming language?

查看:117
本文介绍了如何在Perl或任何其他编程语言中对CJK(亚洲)字符进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在Perl中对中文,日文和韩文(CJK)字符进行排序?

据我所知,按笔画计数然后按部首对CJK字符进行排序似乎是对这些语言进行排序的方式.还有一些按声音排序的方法,但这似乎不太常见.

As far as I can tell, sorting CJK characters by stroke count, then by radical, seems to be the way these languages are sorted. There are also some methods that sort by sounds, but this seems less common.

我尝试使用:

perl -e 'print join(" ", sort qw(工 然 一 人 三 古 二 )), "\n";'
# Prints: 一 三 二 人 古 工 然 which is incorrect

我已经尝试过使用CPAN的Unicode :: Collat​​e,但是它说:

And I've tried using Unicode::Collate from CPAN, but it says:

默认情况下,中日韩统一表意文字是 以Unicode代码点顺序排序...

By default, CJK Unified Ideographs are ordered in Unicode codepoint order...

如果我可以获得每个字符的笔画计数数据库,则可以轻松地对所有字符进行排序,但这似乎不是Perl附带的,也没有封装在我能找到的任何模块中.

If I could get a database of stroke count per character, I could easily sort all of the characters, but this doesn't seem to come with Perl nor is it encapsulated in any module I could find.

如果您知道如何用其他语言对CJK进行排序,则在回答该问题时将其提及会很有帮助.

If you know how to sort CJK in other languages, it would be helpful to mention it in an answer to this question.

推荐答案

请参见 TR38 了解肮脏的细节和角落情况.这并不像您想的那样容易,而且就像此代码示例一样.

See TR38 for the dirty details and corner cases. It's not as easy as you think and as this code sample looks like.

use 5.010;
use utf8;
use Encode;
use Unicode::Unihan;
my $u = Unicode::Unihan->new;

say encode_utf8 sprintf "Character $_ has the radical #%s and %d residual strokes." , split /[.]/, $u->RSUnicode($_) for qw(工 然 一 人 三 古 二);
__END__
Character 工 has the radical #48 and 0 residual strokes.
Character 然 has the radical #86 and 8 residual strokes.
Character 一 has the radical #1 and 0 residual strokes.
Character 人 has the radical #9 and 0 residual strokes.
Character 三 has the radical #1 and 2 residual strokes.
Character 古 has the radical #30 and 2 residual strokes.
Character 二 has the radical #7 and 0 residual strokes.

请参阅 http://en.wikipedia.org/wiki/List_of_Kangxi_radicals ,以获取来自以下位置的映射根序数到中风次数.

See http://en.wikipedia.org/wiki/List_of_Kangxi_radicals for a mapping from radical ordinal number to stroke count.

这篇关于如何在Perl或任何其他编程语言中对CJK(亚洲)字符进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆