编写没有BOM的UTF-8 [英] Writing UTF-8 without BOM
问题描述
此代码
OutputStream out = new FileOutputStream(new File("C:/file/test.txt"));
out.write("A".getBytes());
还有,
OutputStream out = new FileOutputStream(new File("C:/file/test.txt"));
out.write("A".getBytes(StandardCharsets.UTF_8));
产生相同的结果(我认为),即没有BOM的UTF-8.但是, Notepad ++没有显示有关编码的任何信息.我希望notepad ++在这里显示为Encode in UTF-8 without BOM
,但是在编码"菜单中未选择任何编码.
produce the same result(in my opinion), which is UTF-8 without BOM. However, Notepad++ is not showing any information about encoding. I'm expecting notepad++ to show here as Encode in UTF-8 without BOM
, but no encoding is being selected in the "Encoding" menu.
现在,此代码使用BOM编码以UTF-8格式写入文件.
Now, this code write the file in UTF-8 with BOM encoding.
OutputStream out = new FileOutputStream(new File("C:/file/test.txt"));
byte[] bom = { (byte) 239, (byte) 187, (byte) 191 };
out.write(bom);
out.write("A".getBytes());
Notepad ++还将编码类型显示为Encode in UTF-8
.
Notepad++ is also displaying the encoding type as Encode in UTF-8
.
问题:前两个代码有什么问题?这两个代码假定是在没有BOM的情况下以UTF-8格式写入文件的?我的Java代码做对了吗?如果是这样,notepad ++尝试检测编码类型是否存在问题?
Question: What is wrong with the first two codes which are suppose to write the file in UTF-8 without BOM? Is my Java code doing the right thing? If so, is there a problem with notepad++ trying to detect the encoding type?
Notepad ++是否只是在猜测?
Is notepad++ only guessing around?
推荐答案
使用UTF-8编写而没有BOM的"A"与使用ASCII或ISO-编写的"A"完全一样 8859- *或任何其他ASCII兼容编码.该文件包含一个带有十进制值65的字节.
"A" written using UTF-8 without a BOM produces exactly the same file as "A" written using ASCII or ISO-8859-* or any other ASCII-compatible encodings. That file contains a single byte with the decimal value 65.
这样想:
-
"A".getBytes("UTF-8")
返回new byte[] { 65 }
-
"A".getBytes("ISO-8859-1")
返回new byte[] { 65 }
- 您将这些调用的结果写入文件中
- 文件的使用者应该如何区分两者?
"A".getBytes("UTF-8")
returns anew byte[] { 65 }
"A".getBytes("ISO-8859-1")
returns anew byte[] { 65 }
- You write the results of those calls into a file
- How is the consumer of the file supposed to distinguish the two?
该文件中没有 ,表明需要使用UTF-8对其进行解码.
There's nothing in that file that suggests that UTF-8 needs to be used to decode it.
尝试编写Käsekuchen"或其他用ASCII无法编码的内容,然后查看Notepad ++是否正确猜出了编码(因为这正是它的作用:它进行了有根据的猜测,没有告诉的元数据它使用哪种编码).
Try writing "Käsekuchen" or something else that's not encodable in ASCII and see if Notepad++ guesses the encoding correctly (because that's exactly what it does: it makes an educated guess, there's no metadata that tells it which encoding to use).
这篇关于编写没有BOM的UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!