自动检测Java中的字符编码 [英] Auto-Detect Character Encoding in Java
问题描述
似乎是一个相当重要的问题,但我还没有找到解决方案;也许是因为它有许多的味道。这里是的。我正在尝试读取一些逗号分隔的文件(有时分隔符可以比逗号更加独特一些,但是逗号现在就足够了)。
文件是应该是整个行业的标准化,但最近我们看到许多不同类型的字符集文件进来。我想要能够设置一个BufferedReader来补偿这个。
这是一个非常标准的方法,检测是否成功?
我对这种方法的第一个想法是循环通过字符设置simple-> complex,直到我可以读取文件没有例外。感谢您的关注
p> Mozilla的 universalchardet 应该是那里的高效检测器。 juniversalchardet 是它的java端口。还有一个港口。阅读此SO更多信息字符编码检测算法
Seems to be a fairly hit issue, but I've not yet been able to find a solution; perhaps because it comes in so many flavors. Here it is though. I'm trying to read some comma delimited files (occasionally the delimiters can be a little bit more unique than commas, but commas will suffice for now).
The files are supposed to be standardized across the industry, but lately we've seen many different types of character set files coming in. I'd like to be able to set up a BufferedReader to compensate for this.
What is a pretty standard way of doing this and detecting whether it was successful or not?
My first thoughts on this approach are to loop through character sets simple->complex until I can read the file without an exception. Not exactly ideal though...
Thanks for your attention.
The Mozilla's universalchardet is supposed to be the efficient detector out there. juniversalchardet is the java port of it. There is one more port. Read this SO for more information Character Encoding Detection Algorithm
这篇关于自动检测Java中的字符编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!