Java文本文件编码 [英] Java Text File Encoding

查看:250
本文介绍了Java文本文件编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件,它可以是ANSI(使用ISO-8859-2字符集),UTF-8,UCS-2大或小端序。

I have a text file and it can be ANSI (with ISO-8859-2 charset), UTF-8, UCS-2 Big or Little Endian.

有没有办法检测文件的编码以正确读取?

Is there any way to detect the encoding of the file to read it properly?

还是可以读取文件而不给编码? (它读取文件)

Or is it possible to read a file without giving the encoding? (and it reads the file as it is)

(有几个程序可以检测和转换文本文件的编码/格式。)

(There are several program that can detect and convert encoding/format of text files.)

推荐答案

UTF-8和UCS-2 / UTF-16可以通过字节顺序标记。如果这存在,那么这是一个很好的赌注,该文件是在该编码 - 但它不是一个死的确定性。您也可以在其中一种编码中找到 的档案,但没有字节顺序标记。

UTF-8 and UCS-2/UTF-16 can be distinguished reasonably easily via a byte order mark at the start of the file. If this exists then it's a pretty good bet that the file is in that encoding - but it's not a dead certainty. You may well also find that the file is in one of those encodings, but doesn't have a byte order mark.

知道ISO-8859-2,但我不会惊讶,如果几乎每个文件是一个有效的文本文件在该编码。你能做的最好的是检查它的启发式。的确,关于它的维基百科页面会建议只有字节0x7f无效。

I don't know much about ISO-8859-2, but I wouldn't be surprised if almost every file is a valid text file in that encoding. The best you'll be able to do is check it heuristically. Indeed, the Wikipedia page talking about it would suggest that only byte 0x7f is invalid.

不知道如何读取文件as it's,但仍然得到文本 - 一个文件是一个字节序列,因此您必须应用字符编码才能将这些字节解码为字符。

There's no idea of reading a file "as it is" and yet getting text out - a file is a sequence of bytes, so you have to apply a character encoding in order to decode those bytes into characters.

这篇关于Java文本文件编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆