Java 中的自动检测字符编码 [英] Auto-Detect Character Encoding in Java

查看:27
本文介绍了Java 中的自动检测字符编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎是一个相当热门的问题,但我还没有找到解决方案;也许是因为它有很多 种口味.虽然在这里.我正在尝试读取一些逗号分隔的文件(有时分隔符可能比逗号更独特一点,但现在逗号就足够了).

Seems to be a fairly hit issue, but I've not yet been able to find a solution; perhaps because it comes in so many flavors. Here it is though. I'm trying to read some comma delimited files (occasionally the delimiters can be a little bit more unique than commas, but commas will suffice for now).

这些文件应该在整个行业进行标准化,但最近我们看到了许多不同类型的字符集文件.我希望能够设置一个 BufferedReader 来弥补这一点.

The files are supposed to be standardized across the industry, but lately we've seen many different types of character set files coming in. I'd like to be able to set up a BufferedReader to compensate for this.

执行此操作并检测它是否成功的非常标准的方法是什么?

What is a pretty standard way of doing this and detecting whether it was successful or not?

我对这种方法的第一个想法是循环遍历字符集 simple->complex 直到我可以毫无例外地读取文件.虽然不完全理想...

My first thoughts on this approach are to loop through character sets simple->complex until I can read the file without an exception. Not exactly ideal though...

感谢您的关注.

推荐答案

Mozilla 的 universalchardet 应该是高效的检测器.juniversalchardet 是它的 Java 端口.还有一个港口.阅读此 SO 了解更多信息字符编码检测算法

The Mozilla's universalchardet is supposed to be the efficient detector out there. juniversalchardet is the java port of it. There is one more port. Read this SO for more information Character Encoding Detection Algorithm

这篇关于Java 中的自动检测字符编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆