Java文本文件编码 [英] Java Text File Encoding

查看:155
本文介绍了Java文本文件编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件,它可以是ANSI(使用ISO-8859-2字符集),UTF-8,UCS-2 Big或Little Endian。

I have a text file and it can be ANSI (with ISO-8859-2 charset), UTF-8, UCS-2 Big or Little Endian.

有没有办法检测文件的编码以正确读取?

Is there any way to detect the encoding of the file to read it properly?

或者是否可以读取文件而不给出编码? (并且它按原样读取文件)

Or is it possible to read a file without giving the encoding? (and it reads the file as it is)

(有几个程序可以检测和转换文本文件的编码/格式。)

(There are several program that can detect and convert encoding/format of text files.)

推荐答案

UTF-8和UCS-2 / UTF-16可以通过字节顺序标记。如果这样存在,那么这个文件就是一个很好的赌注,那个文件就是这个编码 - 但是它不是一个死的确定性。您也可以在其中一个编码中找到文件 ,但没有字节顺序标记。

UTF-8 and UCS-2/UTF-16 can be distinguished reasonably easily via a byte order mark at the start of the file. If this exists then it's a pretty good bet that the file is in that encoding - but it's not a dead certainty. You may well also find that the file is in one of those encodings, but doesn't have a byte order mark.

很了解ISO-8859-2,但如果每个文件是该编码中有效的文本文件,我不会感到惊讶。你能做的最好的事情是启发式地检查。的确,维基百科页面谈论它将表明只有字节0x7f无效。

I don't know much about ISO-8859-2, but I wouldn't be surprised if almost every file is a valid text file in that encoding. The best you'll be able to do is check it heuristically. Indeed, the Wikipedia page talking about it would suggest that only byte 0x7f is invalid.

无法读取文件原样,而是取出文本 - 一个文件是一系列字节,所以你必须应用字符编码才能将这些字节解码为字符。

There's no idea of reading a file "as it is" and yet getting text out - a file is a sequence of bytes, so you have to apply a character encoding in order to decode those bytes into characters.

这篇关于Java文本文件编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆