什么是XML BOM以及如何检测它? [英] What is XML BOM and how do I detect it?

查看:382
本文介绍了什么是XML BOM以及如何检测它?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

ANSI XML文档中的BOM究竟是什么?是否应将其删除? XML文档应该是UTF-8吗?谁能告诉我一个可以检测BOM的Java方法? BOM由字符EF BB BF组成。

What exactly is the BOM in a ANSI XML document and should it be removed? Should a XML document be in UTF-8 instead? Can anyone tell me a Java method that will detect the BOM? The BOM consists of the characters EF BB BF .

推荐答案

对于ANSI XML文件,它实际上应该被删除。如果你想使用UTF-8,你真的不需要它。仅适用于UTF-16和UTF-32。

For a ANSI XML file it should actually be removed. If you want to use UTF-8 you don't really need it. Only for UTF-16 and UTF-32 it is needed.


字节顺序标记(或BOM)是
$特殊标记添加在UTF-8,UTF-16或UTF-32编码
的Unicode文件的
开头。使用
来表示文件是使用
big-endian还是little-endian字节
顺序。对于UTF-16
和UTF-32,BOM是强制性的,但对于
UTF-8,它是可选的。

The Byte-Order-Mark (or BOM), is a special marker added at the very beginning of an Unicode file encoded in UTF-8, UTF-16 or UTF-32. It is used to indicate whether the file uses the big-endian or little-endian byte order. The BOM is mandatory for UTF-16 and UTF-32, but it is optional for UTF-8.

关于如何在java中检测这个问题。

Regarding the question on how detect this in java.

检查以下问题的答案:Java:如何确定流的正确字符集编码,如果你现在想要自己确定BOM(风险自负)检查例如此代码 Java提示:如何阅读文件并自动指定正确的编码

Check the following answer to this question: Java : How to determine the correct charset encoding of a stream and if you now want to determine the BOM yourself (at your own risk) check for example this code Java Tip: How to read a file and automatically specify the correct encoding.

基本上只需自己读取前几个字节,然后确定是否可能找到了BOM。

Basically just read in the first few bytes yourself and then determine if you may have found a BOM.

这篇关于什么是XML BOM以及如何检测它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆