识别文本为简体中文还是繁体中文 [英] Recognizing text as Simplified vs. Traditional Chinese
本文介绍了识别文本为简体中文还是繁体中文的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
给出一个已知为中文并以UTF-8编码的文本块,有没有办法确定它是简体还是繁体?
Given a block of text that's known to be Chinese and encoded in UTF-8, is there a way to determine if it's Simplified or Traditional?
推荐答案
我不知道这是否行得通,但我会尝试使用iconv来查看它是否可以在字符集之间正确转换,并将同一转换的结果与// TRANSLIT和// IGNORE进行比较。如果两个结果匹配,则字符集转换未遇到任何无法翻译的字符,因此您应该匹配。
I don't know if this will work, but I'd try using iconv to see if it will translate between the charsets correctly, comparing the results from the same conversion with //TRANSLIT and //IGNORE. If the two results match, then the charset conversion hasn't encountered any characters that fail to translate, so you should have a match.
$test1 = iconv("UTF-8", "big5//TRANSLIT", $text);
$test2 = iconv("UTF-8", "big5//IGNORE", $text);
if ($test1 == $test2) {
echo 'traditional';
} else {
$test3 = iconv("UTF-8", "gb2312//TRANSLIT", $text);
$test4 = iconv("UTF-8", "gb2312//IGNORE", $text);
if ($test3 == $test4) {
echo 'simplified';
} else {
echo 'Failed to match either traditional or simplified';
}
}
这篇关于识别文本为简体中文还是繁体中文的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文