如何读取编码头而不知道编码? [英] How to read the encoding header without knowing the encoding?

查看:130
本文介绍了如何读取编码头而不知道编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我正在读取HTML文件的XML,我不必读取告诉我的编码能够读取文件的标签?是不是该标签编码与文件是相同的方式?我很好奇你如何读出该标签与知道编码。我意识到这是解决问题。我只是好奇它的做法。

If I am reading an XML of HTML file, don't I have to read the tag that tells me the encoding to be able to read the file? Isn't that tag encoded the same way the file is? I am curious how you read that tag with out knowing the encoding. I realize this is solved problem. I am just curious how its done.

更新1

它,在UTF-16每个字符需要2字节,而不是一个,不同于ascii?例如,UTF-16(U + 0045)中的字符E是0xfeff0045。那就是0xfeff然后0x0045,但是一些编码改变了endian。

I dont get it, in UTF-16 wont each character take 2 bytes, not one, and be different than ascii? For example the character E in UTF-16 (U+0045) is 0xfeff0045. That is 0xfeff then 0x0045, but some encodings change the endian of that. Do you have to figure it out by checkign for 0xfeff and realizing that can't be ASCII or something?

推荐答案

这里是什么W3C必须说明:

Here's what W3C has to say about it:


XML编码声明函数
作为每个实体的内部标签,
表示其中的字符编码是
。在XML处理器可以
读取内部标签之前,
显然必须知道
编码正在使用什么字符 - 这是
内部标签尝试的内容表明。
在一般情况下,这是一个
的无希望的情况。然而,它不是完全
在XML中无望的,因为XML
以两种方式限制了一般情况:
每个实现假设为
只支持有限的字符集合
编码,并且XML编码
声明限制在位置
和内容中,以便使
可行以自动检测字符
编码在$ b中的每个实体中使用$ b正常情况。

The XML encoding declaration functions as an internal label on each entity, indicating which character encoding is in use. Before an XML processor can read the internal label, however, it apparently has to know what character encoding is in use--which is what the internal label is trying to indicate. In the general case, this is a hopeless situation. It is not entirely hopeless in XML, however, because XML limits the general case in two ways: each implementation is assumed to support only a finite set of character encodings, and the XML encoding declaration is restricted in position and content in order to make it feasible to autodetect the character encoding in use in each entity in normal cases.

http://www.w3.org/TR/2000/REC-xml-20001006#sec-guessing

这篇关于如何读取编码头而不知道编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆