自动检测Java中的字符编码 [英] Auto-Detect Character Encoding in Java

查看:164
本文介绍了自动检测Java中的字符编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎是一个相当重要的问题,但我还没有找到解决方案;也许是因为它有许多的味道。这里是的。我正在尝试读取一些逗号分隔的文件(有时分隔符可以比逗号更加独特一些,但是逗号现在就足够了)。



文件是应该是整个行业的标准化,但最近我们看到许多不同类型的字符集文件进来。我想要能够设置一个BufferedReader来补偿这个。



这是一个非常标准的方法,检测是否成功?



我对这种方法的第一个想法是循环通过字符设置simple-> complex,直到我可以读取文件没有例外。感谢您的关注

解决方案

p> Mozilla的 universalchardet 应该是那里的高效检测器。 juniversalchardet 是它的java端口。还有一个港口。阅读此SO更多信息字符编码检测算法


Seems to be a fairly hit issue, but I've not yet been able to find a solution; perhaps because it comes in so many flavors. Here it is though. I'm trying to read some comma delimited files (occasionally the delimiters can be a little bit more unique than commas, but commas will suffice for now).

The files are supposed to be standardized across the industry, but lately we've seen many different types of character set files coming in. I'd like to be able to set up a BufferedReader to compensate for this.

What is a pretty standard way of doing this and detecting whether it was successful or not?

My first thoughts on this approach are to loop through character sets simple->complex until I can read the file without an exception. Not exactly ideal though...

Thanks for your attention.

解决方案

The Mozilla's universalchardet is supposed to be the efficient detector out there. juniversalchardet is the java port of it. There is one more port. Read this SO for more information Character Encoding Detection Algorithm

这篇关于自动检测Java中的字符编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆