识别文本为简体中文还是繁体中文 [英] Recognizing text as Simplified vs. Traditional Chinese

查看:152
本文介绍了识别文本为简体中文还是繁体中文的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个已知为中文并以UTF-8编码的文本块,有没有办法确定它是简体还是繁体?

Given a block of text that's known to be Chinese and encoded in UTF-8, is there a way to determine if it's Simplified or Traditional?

推荐答案

我不知道这是否行得通,但我会尝试使用iconv来查看它是否可以在字符集之间正确转换,并将同一转换的结果与// TRANSLIT和// IGNORE进行比较。如果两个结果匹配,则字符集转换未遇到任何无法翻译的字符,因此您应该匹配。

I don't know if this will work, but I'd try using iconv to see if it will translate between the charsets correctly, comparing the results from the same conversion with //TRANSLIT and //IGNORE. If the two results match, then the charset conversion hasn't encountered any characters that fail to translate, so you should have a match.

$test1 = iconv("UTF-8", "big5//TRANSLIT", $text);
$test2 = iconv("UTF-8", "big5//IGNORE", $text);
if ($test1 == $test2) {
   echo 'traditional';
} else {
   $test3 = iconv("UTF-8", "gb2312//TRANSLIT", $text);
   $test4 = iconv("UTF-8", "gb2312//IGNORE", $text);
   if ($test3 == $test4) {
      echo 'simplified';
   } else {
      echo 'Failed to match either traditional or simplified';
   }
}

这篇关于识别文本为简体中文还是繁体中文的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆