如何识别UTF-8编码的字符串 [英] Howto identify UTF-8 encoded strings

查看：174 发布时间：2017/8/16 22:25:51 unicode encoding utf-8

本文介绍了如何识别UTF-8编码的字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

识别字符串（是或否）可能是UTF-8编码的最佳方式是什么？ Win32 API IsTextUnicode 在这里没有什么帮助。此外，字符串将不具有UTF-8 BOM，因此无法检查。而且，是的，我知道只有ASCII范围以上的字符才能被编码超过1个字节。

解决方案

chardet 由Mozilla使用的FireFox开发的字符集检测。源代码

jchardet 是来自mozilla的自动字符集检测算法的源码的java端口。

NCharDet 是一个。在Mozilla和FireFox浏览器中使用的C ++ Java端口的Net（C＃）端口。

代码项目C＃使用Microsoft的 MLang 进行字符编码检测。

UTRAC 是用c ++编写的命令行工具和库，用于检测字符串编码

cpdetector 是用于编码检测的delphi库

另一个指向大量图书馆的有用的帖子，以帮助您确定字符编码 http://fredeaker.blogspot.com/2007/01/character-encoding-detection.html

您还可以查看相关问题当BOM（字节顺序标记）缺失时，如何最好地猜测编码？，它有一些有用的内容。 p>

What's the best way to identify if a string (is or) might be UTF-8 encoded? The Win32 API IsTextUnicode isn't of much help here. Also, the string will not have an UTF-8 BOM, so that cannot be checked for. And, yes, I know that only characters above the ASCII range are encoded with more than 1 byte.

解决方案

chardet character set detection developed by Mozilla used in FireFox. Source code

jchardet is a java port of the source from mozilla's automatic charset detection algorithm.

NCharDet is a .Net (C#) port of a Java port of the C++ used in the Mozilla and FireFox browsers.

Code project C# sample that uses Microsoft's MLang for character encoding detection.

UTRAC is a command line tool and library written in c++ to detect string encoding

cpdetector is a delphi library used for encoding detection

Another useful post that points to a lot of libraries to help you determine character encoding http://fredeaker.blogspot.com/2007/01/character-encoding-detection.html

You could also take a look at the related question How Can I Best Guess the Encoding when the BOM (Byte Order Mark) is Missing?, it has some useful content.

这篇关于如何识别UTF-8编码的字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何识别UTF-8编码的字符串 [英] Howto identify UTF-8 encoded strings

问题描述

相关文章

开发方法最新文章

热门教程

热门工具

登录关闭

如何识别UTF-8编码的字符串 [英] Howto identify UTF-8 encoded strings

问题描述

相关文章

开发方法最新文章

热门教程

热门工具

登录 关闭

登录关闭