XML标头中的'encoding'有什么用处? [英] What use is the 'encoding' in the XML header?

查看:345
本文介绍了XML标头中的'encoding'有什么用处?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

查看XML标头

 <?xml version =1.0encoding =UTF-16standalone = no?> 



我可以声明 encoding 属性为




  • 太晚了(除非您知道编码...否则无法正确读取)




$ b
  • b

    还是该属性不是关于流的内容



    我在这里混淆吗?

    解决方案

    如前所述,您必须知道文件的编码才能读取



    但是,有一个启发式方法可以很容易地让你接近真正的编码,允许你读取编码属性。这样做是因为<?xml 部分定义只能包含ASCII范围中的字符(但是它们是编码的)。



    XML标准,甚至描述了用于找出编码的确切过程



    并且编码标签也不是冗余的。例如,如果您使用XML规范中的算法发现某些基于ASCII(或兼容ASCII)的编码被使用,您仍需要读取该编码,以确定哪个是实际的使用(有效候选字符为ASCII,UTF-8,任何 ISO-8859- *编码,任何 Windows- * 编码, KOI8-R 和许多,许多其他)。对于<?xml 部分本身它不会有什么区别,它是,但对于文档的其余部分,它可以产生巨大的差异。 p>

    关于标签错误的XML文件:是的,很容易产生这些:XML规范明确指出这些文件是错误的因此不是正确的XML。不正确的编码必须报告为错误(只要可以检测到!)。所以这是生产XML的任何人的问题。


    Looking at the XML header

    <?xml version="1.0" encoding="UTF-16" standalone="no"?>
    

    Am I right to state that the encoding attribute is

    • coming too late (you can't read it properly unless you know the encoding...)
    • redundant, hence error-prone: it's all too easy to replace it with "Big5" yet save the file in UTF-8

    Or is that attribute not about the content of the stream?

    Am I mixing up things here?

    解决方案

    As you mentioned, you'd have to know the encoding of the file to read the encoding attribute.

    However, there is a heuristic that can easily get you close enough to the "real" encoding to allow you to read the encoding attribute. This works, because the <?xml part by definition can only contain characters in the ASCII range (however they are encoded).

    The XML standard even describes the exact process used to find out the encoding.

    And the encoding label isn't redundant either. For example, if you use the algorithm in the XML spec to find out that some ASCII-based (or ASCII-compatible) encoding is used you still need to read the encoding to find out which one is actually use (valid candidates would be ASCII, UTF-8, any of the ISO-8859-* encodings, any of the Windows-* encodings, KOI8-R and many, many others). For the <?xml part itself it won't make a difference which one it is, but for the rest of the document, it can make a huge difference.

    Regarding mis-labeled XML files: yes, it's easy to produce those, however: the XML spec clearly specifies that those files are mal-formed and as such are not correct XML. Incorrect encodings must be reported as an error (as long as they can be detected!). So it's the problem of whoever is producing the XML.

    这篇关于XML标头中的'encoding'有什么用处?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆