如何检测Java中的字符集编码? [英] How to detect which character set encoding in Java?

查看:596
本文介绍了如何检测Java中的字符集编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人知道是否有一种简单的方法来检测Java中的字符集编码?在我看来,一些程序有能力检测一个给定的数据使用的字符集,或至少做一个近似。

Does anybody know if there is a simple way to detect character set encoding in Java? It seems to me that some programs have the ability to detect which character set a given piece of data uses, or at least make an aproximation.

我想基本的机制必须对每个字符集中的数据进行解码,并选择具有最少未定义字符的那个字符,然后选择哪个字符集更常见以打破领带。

I suppose the underlying mechanism would have to decode the data in each character set and pick whichever one has the least undefined characters followed by which character set is more common to break a tie.

任何想法? / p>

Any ideas?

推荐答案

要查找数据是否为任何Unicode格式(UTF-8,UTF-16 ...等)数据在字节流中并检查前4个字节(BOM大小),并且对于每个编码,它将是不同的

For finding whether data is in any unicode format( UTF-8,UTF-16... etc) you can read the data in byte stream and check the first 4 bytes( BOM size) , and for each encoding it will be different

例如:

对于UTF-8,前3个字节将是EF,BB,BF

用于unicode编码之外的编码不确定...

for encodings other than unicode encodings i am not sure...

这篇关于如何检测Java中的字符集编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆