编写没有BOM的UTF-8 [英] Writing UTF-8 without BOM

查看:134
本文介绍了编写没有BOM的UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此代码

OutputStream out = new FileOutputStream(new File("C:/file/test.txt"));
out.write("A".getBytes());

还有,

OutputStream out = new FileOutputStream(new File("C:/file/test.txt"));
out.write("A".getBytes(StandardCharsets.UTF_8));

产生相同的结果(我认为),即没有BOM的UTF-8.但是, Notepad ++没有显示有关编码的任何信息.我希望notepad ++在这里显示为Encode in UTF-8 without BOM,但是在编码"菜单中未选择任何编码.

produce the same result(in my opinion), which is UTF-8 without BOM. However, Notepad++ is not showing any information about encoding. I'm expecting notepad++ to show here as Encode in UTF-8 without BOM, but no encoding is being selected in the "Encoding" menu.

现在,此代码使用BOM编码以UTF-8格式写入文件.

Now, this code write the file in UTF-8 with BOM encoding.

 OutputStream out = new FileOutputStream(new File("C:/file/test.txt"));
 byte[] bom = { (byte) 239, (byte) 187, (byte) 191 };
 out.write(bom);
 out.write("A".getBytes()); 

Notepad ++还将编码类型显示为Encode in UTF-8.

Notepad++ is also displaying the encoding type as Encode in UTF-8.

问题:前两个代码有什么问题?这两个代码假定是在没有BOM的情况下以UTF-8格式写入文件的?我的Java代码做对了吗?如果是这样,notepad ++尝试检测编码类型是否存在问题?

Question: What is wrong with the first two codes which are suppose to write the file in UTF-8 without BOM? Is my Java code doing the right thing? If so, is there a problem with notepad++ trying to detect the encoding type?

Notepad ++是否只是在猜测?

Is notepad++ only guessing around?

推荐答案

使用UTF-8编写而没有BOM的"A"与使用ASCII或ISO-编写的"A"完全一样 8859- *或任何其他ASCII兼容编码.该文件包含一个带有十进制值65的字节.

"A" written using UTF-8 without a BOM produces exactly the same file as "A" written using ASCII or ISO-8859-* or any other ASCII-compatible encodings. That file contains a single byte with the decimal value 65.

这样想:

  • "A".getBytes("UTF-8")返回new byte[] { 65 }
  • "A".getBytes("ISO-8859-1")返回new byte[] { 65 }
  • 您将这些调用的结果写入文件中
  • 文件的使用者应该如何区分两者?
  • "A".getBytes("UTF-8") returns a new byte[] { 65 }
  • "A".getBytes("ISO-8859-1") returns a new byte[] { 65 }
  • You write the results of those calls into a file
  • How is the consumer of the file supposed to distinguish the two?

该文件中没有 ,表明需要使用UTF-8对其进行解码.

There's nothing in that file that suggests that UTF-8 needs to be used to decode it.

尝试编写Käsekuchen"或其他用ASCII无法编码的内容,然后查看Notepad ++是否正确猜出了编码(因为这正是它的作用:它进行了有根据的猜测,没有告诉的元数据它使用哪种编码).

Try writing "Käsekuchen" or something else that's not encodable in ASCII and see if Notepad++ guesses the encoding correctly (because that's exactly what it does: it makes an educated guess, there's no metadata that tells it which encoding to use).

这篇关于编写没有BOM的UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆