在XML文件中设置编码 [英] Setting encoding in XML files

查看:508
本文介绍了在XML文件中设置编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

哪些是有效的xml编码字符串?例如,指定UTF-8的方式是什么:




  • encoding =utf8

  • encoding =utf8




或Windows 1251:




  • =windows-1251

  • encoding =windows1251

  • encoding =cp-1251




我正在制作一个字符解码器以及一个xml解析器。因此,我需要能够根据编码属性的值设置我的StreamReader的编码。



任何想法,我可以找到官方编码的列表字符串?



我能找到的最好的是这个,但它似乎是IE特定的。



谢谢!

解决方案

如果全部失败,请阅读规范: - )。


4.3.3实体中的字符编码



XML
文件中的每个外部解析实体可能会使用不同的编码
作为其字符。



[...]



在编码声明中,值
UTF-8,UTF-16,
ISO-10646-UCS-2和
ISO-10646-UCS-4应用于
各种编码和
转换的Unicode / ISO / IEC
10646,值ISO-8859-1,
ISO-8859-2,...ISO-8859-n
(其中n是零件号)应该是
用于ISO 8859的零件,
值ISO-2022-JP,
Shift_JIS和EUC-JP应用于
JIS X-0208-1997的各种编码形式的



建议
注册的字符编码
(作为charsets)与Internet
分配的数字权限
IANA-CHARSETS ,除了只列出
之外,请参考使用他们的
注册名称;其他编码
应使用以x-
前缀开头的名称。


资料来源: http://www.w3.org/TR/REC-xml/



所以UTF-8写成 encoding =UTF-8



对于上面未列出的其他字符集,请使用 IANA字符集列表



字符集名称中的字母大小写不重要:但是,使用大小写字母之间没有区别
。 (IANA字符集列表)。所以你也可以编写
encoding =uTf-8,如果你觉得这样的话; - )。



你真的真的确定你想编写自己的XML解析器吗?这听起来像是重新发明轮胎。


Which are the valid xml encoding strings? For instance, what is the way of specifying UTF-8:

  • encoding="utf8"
  • encoding="utf8"
  • etc

Or Windows 1251:

  • encoding="windows-1251"
  • encoding="windows1251"
  • encoding="cp-1251"
  • etc.

I am making a character decoder as well as a xml parser. Thus, I need to be able to set the encoding of my StreamReader based on the value from the encoding attribute.

Any ideas where I could find a list of the official encoding string?

The best I could find is this, but it seems to be IE specific.

Thanks!

解决方案

If all fails, read the spec :-).

4.3.3 Character Encoding in Entities

Each external parsed entity in an XML document may use a different encoding for its characters.

[...]

In an encoding declaration, the values " UTF-8 ", " UTF-16 ", " ISO-10646-UCS-2 ", and " ISO-10646-UCS-4 " SHOULD be used for the various encodings and transformations of Unicode / ISO/IEC 10646, the values " ISO-8859-1 ", " ISO-8859-2 ", ... " ISO-8859- n " (where n is the part number) SHOULD be used for the parts of ISO 8859, and the values " ISO-2022-JP ", " Shift_JIS ", and " EUC-JP " SHOULD be used for the various encoded forms of JIS X-0208-1997.

It is RECOMMENDED that character encodings registered (as charsets) with the Internet Assigned Numbers Authority IANA-CHARSETS, other than those just listed, be referred to using their registered names; other encodings SHOULD use names starting with an "x-" prefix.

Source: http://www.w3.org/TR/REC-xml/

So UTF-8 is written as encoding="UTF-8".

For other character sets not listed above, use the names given in the IANA character set list.

Case of the letters in the character set name is not significant: "However, no distinction is made between use of upper and lower case letters." (IANA character set list). So you could also write encoding="uTf-8" if you feel like it ;-).

BTW: Are you really, really certain you want to write your own XML parser? This sounds suspiciously like reinventing the wheel.

这篇关于在XML文件中设置编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆