如何像Excel一样排序日语 [英] How to sort Japanese like Excel

查看:306
本文介绍了如何像Excel一样排序日语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想像excel中的排序功能一样对日语单词(汉字)进行排序。
我尝试了很多方法来在PHP中对日语文本进行排序,但结果却不是100%像excel中的结果。

I want to sort Japanese words ( Kanji) like sort feature in excel. I have tried many ways to sort Japanese text in PHP but the result is not 100% like result in excel.

First。我尝试使用此lib将汉字转换为片假名( https://osdn.net/projects/igo -php / ),但某些情况与excel不同。
我想对这些单词进行排序

First . I tried to convert Kanji to Katakana by using this lib (https://osdn.net/projects/igo-php/) but some case is not same like excel. I want to sort these words ASC

けやきの家

高森台病院

みのりの里

我的结果:

けやきの家

高森台病院

みのりの里

Excel结果:

けやきの家

みのりの里

高森台病院

第二次,我尝试使用此功能

Second I tried other way by using this function

 mb_convert_kana($text, "KVc", "utf-8");

上面的文本的排序结果是正确的,但其中有些情况不正确

The sorting result is correct with those text above, but it contain some case not correct

米田病院

米田病院

高森台病院

我的结果:

米田病院

米田病院

高森台病院

Excel结果:

高森台病院

米田病院

米田病院

你们对此有任何想法吗? (对不起我的英语不好 ) 。谢谢

Do you guys have any idea about this. (Sorry for my English ) . Thank you

推荐答案

首先,日语汉字不能排序。您可以按其代码编号进行排序,但是该顺序没有任何意义。

Firstly, Japanese kanji are not sortable. You can sort by its code number, but that order has no meanings.

您使用Igo(或任何其他形态分析库)听起来不错的解决方案,尽管不能完善。您的第一个排序结果对我来说似乎还不错。为什么要让它们按Excel顺序排序?

Your using Igo (or any other morphological analysis libraries) sounds good solution, though it can not be perfect. And your first sort result seems fine for me. Why do you want them to be sorted in Excel order?

在Excel中,如果用户最初在日语IME上键入时,单元格一直记住其语音标记(输入法编辑器) ),该语音将被分类使用。这意味着,由于并非所有单元格都可以在IME上手动键入,因此某些单元格可能不具有如何读取这些汉字的信息。因此,在Excel上对汉字进行排序的结果可能是不可预测的。 (如果需要严格排序,通常我们在平假名或片假名中添加另一个yomigana字段,然后按该列排序。)

In Excel, if a cell keeps remembering its phonetic notations when the user initially typed on Japanese IME (Input Method Editor), that phonetics will be used in sort. That means, as not all cell might be typed manually on IME, some cells may not have information how those kanji-s are read. So results of sorting Kanji-s on Excel could be pretty unpredictable. (If sort seriously needed, usually we add another yomigana field, either in hiragana or katakana, and sort by that column.)

第二种方法mb_convert_kana()完全关闭点。该功能是对平假名/片假名进行标准化,因为出于历史原因有两套字母(全角假名和半角假名)。将该功能应用于日语文本只会更改假名部分。如果这样使您的期望得到满足,那一定是巧合。

The second method mb_convert_kana() is totally off point. That function is to normalize hiragana/katakana, as there are two sets of letters by historical reason (full-width kana and half-width kana). Applying that function to your Japanese texts only changes kana parts. If that made your expectation satisfied, that must be coincidence.

您必须定义客户首先需要的Excel Japanese排序顺序。

You must define what Excel Japanese sort order your customer requires first. I will be happy to help you if it is clear.

[更新]

如op所言,mb_convert_kana很高兴为您提供帮助。 ()用于对混合的平假名/片假名进行排序。为此,我建议使用php_intl整理程序。例如,

As op commented, mb_convert_kana() was to sort mixed hiragana/katakana. For that purpose, I suggest to use php_intl Collator. For example,

<?php

// demo: Japanese(kana) sort by php_intl Collator

if (version_compare(PHP_VERSION, '5.3.0', '<')) {
    exit ('php_intl extension is available on PHP 5.3.0 or later.');
}    
if (!class_exists('Collator')) {
    exit ('You need to install php_intl extension.');
}

$collator = new Collator('ja_JP');
$textArray = [
  'カキクケコ',
  '日本語',
  'アアト',
  'Alphabet',
  'アイランド',
  'はひふへほ',
  'あいうえお',
  '漢字',
  'たほいや',
  'さしみじょうゆ',
  'Roma',
  'ラリルレロ',
  'アート',
];

$result = $collator->sort($textArray);
if ($result === false) {
    echo "sort failed" . PHP_EOL;
    exit();
}

var_dump($textArray);

这对平假名/片假名混合文本数组进行排序。结果在这里。

This sorts hiragana/katakana mixed texts array. Results are here.

array(13) {
  [0]=>
  string(8) "Alphabet"
  [1]=>
  string(4) "Roma"
  [2]=>
  string(9) "アート"
  [3]=>
  string(9) "アアト"
  [4]=>
  string(15) "あいうえお"
  [5]=>
  string(15) "アイランド"
  [6]=>
  string(15) "カキクケコ"
  [7]=>
  string(21) "さしみじょうゆ"
  [8]=>
  string(12) "たほいや"
  [9]=>
  string(15) "はひふへほ"
  [10]=>
  string(15) "ラリルレロ"
  [11]=>
  string(6) "漢字"
  [12]=>
  string(9) "日本語"
}

您不会需要自己对它们进行标准化。 PHP(尽管具有php_intl扩展名)和数据库(例如MySQL)都知道如何对多种语言的字母进行排序,因此您无需编写它。

You won't need to normalize them by yourself. Both PHP(though with php_intl extension) and database(such like MySQL) know how to sort alphabets in many languages so you do not need to write it.

无法解决原始问题,即汉字排序。

And, this does not solve the original issue, Kanji sort.

这篇关于如何像Excel一样排序日语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆