内容类型的META标签悖论? [英] Content-type META tag paradox?

查看:70
本文介绍了内容类型的META标签悖论?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么有人会期望内容类型的META标签有效

?是不是因为有人被偶然的信件误导了字母,数字和字符{< > /; ,在几个特殊的常见编码(US-ASCII,

ISO-8859-1等)的相同位置,'= =恰好是



即使假设这些字符总是在同一个位置,在

之前,它可以首先找到我的META标签,UA必须制作*初始*

假设文档是否是8位而不是16位。


让我说我有足够的反感来保存我的HTML文档使用编辑器

将文本编码为EBCDIC。 UA怎么会弄清楚我的

META标签在哪里,但是在哪里?是什么?


而这只是对编码的补充。内容类型怎么样?如果我将
扭曲到足以告诉UA,我一直假设我的文件

是一个HTML文本文档,足够长到达并解析我的META标记怎么办?

首先,文档真的是文本/普通文件? (在这种情况下,

会从一开始就基于text / plain重新开始,这次没有完全解析任何解析,因此找不到META标签所以,并且

因此没有找到文本/ html的默认假设的覆盖,

因此从头开始并将文档解析为

HTML,然后找到META标签并实现文件是

text / plain,并根据该假设重新开始,然后

。 ....)或者image / gif?


所以我只是好奇内容类型的META标签是如何进入

的规范的第一名。这似乎违背逻辑。


-

Harlan Messinger

从我的电子邮件地址中删除第一个点。

Veuillez?ter le premier point de mon adresse de courriel。

解决方案

" Harlan Messinger" < H ********* @ comcast.net> écritdansle message de

news:c5 ************ @ ID-114100.news.uni-berlin.de

让据说我使用编辑文本为EBCDIC的
编辑器来保存我的HTML文档是不正常的。 UA怎么会不知道我的META标签在哪里,但是在哪里?




我无法理解 - HTML必须只使用us-ascii字符,要么做

内容类型属性值...

胡?


< blockquote>Harlan Messinger < H ********* @ comcast.net> écritdansle message de

news:c5 ************ @ ID-114100.news.uni-berlin.de

让据说我使用编辑文本为EBCDIC的
编辑器来保存我的HTML文档是不正常的。 UA怎么会不知道我的META标签在哪里,但是在哪里?




我无法理解 - HTML必须只使用us-ascii字符,要么做

内容类型属性值...

胡?


< blockquote> Harlan Messinger写道:

[snip]

这只是加上编码。内容类型怎么样?如果我足够扭曲告诉UA,我一直假设我的
文档是一个HTML文本文档,足够长时间到达并解析我的
META标记,该怎么办,该文件真的是文字/普通文件? (在那种情况下,它会从一开始就基于text / plain开始,
这次根本没有进行任何解析,因此根本没有找到META
标签,并且因此没有找到对text / html的默认假设的覆盖,因此从一开始就是AGAIN并将文档解析为HTML,然后找到META标签并实现文档是text / plain,并基于
那个假设重新开始,然后......)或者image / gif?


HTTP 1.1规范清楚地表明Content-Type标题

应优先于响应正文中的任何内容:


"当且仅当媒体类型不是由Content-Type字段给出时,

收件人可能会尝试通过检查其内容来猜测媒体类型

和/或用于标识资源的URI的名称扩展名。


- < URL:http:// www .w3.org / Protocols / rfc2616 / rfc2616-sec7.html#sec7.2.1>


因此,如果HTTP标头说它是文本/纯文本,则为text / plain是(除非您使用

a浏览器违反HTTP 1.1规范)。


所以我只是好奇内容类型的META标签是如何进入的这个规格在第一名。它似乎违背了逻辑。




根据HTML 4.01规范,< meta>具有http-equiv

属性的元素旨在由服务器解析并转换为正确的
HTTP标头。实际上,这通常是不切实际的,并且很久以前浏览器开始关注自己。正如你所说的,这可能会导致一些愚蠢的结果。


HTTP服务器可能会使用http-equiv指定的属性名称

属性用于在HTTP响应中创建[RFC822]样式的标题。


- < URL:http://www.w3 .org / TR / html401 / struct / global.html#h-7.4.4.2>


就找出字符编码的悖论而言,我是这样的/>
理解,如果HTTP标头没有指明编码,那么它将默认为US-ASCII,当它到达相关的< meta>时元素,

浏览器可以选择使用该字符编码重新开始。

这意味着只要您使用US-ASCII的超集,它就会我相信默认字符编码的规则在你开始时就会改变

谈论XHTML(另外你必须把XML prolog放到混合中)#b $ b )。

还记得浏览器可以(可靠地)通过BOM检测UTF-16。


你可能想读完这个很多,如果你还没有:


< URL:http://ppewww.ph.gla.ac.uk/~flavell/charset/>

-

Jim Dabell


Why would anyone ever have expected a content-type META tag to be effective
at all? Is it because someone was misled by the happenstance the letters of
the alphabet, the digits, and the characters {< > / ; , " '' =} happen to be
at the same locations in several particular common encodings (US-ASCII,
ISO-8859-1, etc.)?

Even assuming that these characters are always in the same locations, before
it can find my META tag in the first place, the UA has to make an *initial*
assumption as to whether the document is even 8-bit versus 16-bit.

Let''s say I were perverse enough to save my HTML document using an editor
that encodes text as EBCDIC. How would the UA figure out not only where my
META tag is, but where ANYTHING is?

And that just addreses the encoding. How about the content-type? What if I
were twisted enough to tell the UA, which has been assuming that my document
is an HTML text document just long enough to reach and parse my META tag in
the first place, that the document is really text/plain? (In that case,
would it start over from the beginning based on text/plain, this time not
doing any parsing at all, and therefore not finding a META tag at all, and
therefore not finding an override for the default supposition of text/html,
and therefore AGAIN starting from the beginning and parsing the document as
HTML, and then finding the META tag and realizing the document is
text/plain, and starting all over again based on that assumption, and then
.....) Or image/gif?

So I''m just curious how the content-type META tag got into the spec in the
first place. It seems to defy logic.

--
Harlan Messinger
Remove the first dot from my e-mail address.
Veuillez ?ter le premier point de mon adresse de courriel.

解决方案

"Harlan Messinger" <h.*********@comcast.net> a écrit dans le message de
news:c5************@ID-114100.news.uni-berlin.de

Let''s say I were perverse enough to save my HTML document using an
editor that encodes text as EBCDIC. How would the UA figure out not
only where my META tag is, but where ANYTHING is?



I can''t understand - HTML must use only us-ascii characters, either do the
content type attributes values...
Hu ?


"Harlan Messinger" <h.*********@comcast.net> a écrit dans le message de
news:c5************@ID-114100.news.uni-berlin.de

Let''s say I were perverse enough to save my HTML document using an
editor that encodes text as EBCDIC. How would the UA figure out not
only where my META tag is, but where ANYTHING is?



I can''t understand - HTML must use only us-ascii characters, either do the
content type attributes values...
Hu ?


Harlan Messinger wrote:
[snip]

And that just addreses the encoding. How about the content-type? What if I
were twisted enough to tell the UA, which has been assuming that my
document is an HTML text document just long enough to reach and parse my
META tag in the first place, that the document is really text/plain? (In
that case, would it start over from the beginning based on text/plain,
this time not doing any parsing at all, and therefore not finding a META
tag at all, and therefore not finding an override for the default
supposition of text/html, and therefore AGAIN starting from the beginning
and parsing the document as HTML, and then finding the META tag and
realizing the document is text/plain, and starting all over again based on
that assumption, and then ....) Or image/gif?
The HTTP 1.1 specification makes it clear that the Content-Type header
should take precedence over anything that may be in the response body:

"If and only if the media type is not given by a Content-Type field, the
recipient MAY attempt to guess the media type via inspection of its content
and/or the name extension(s) of the URI used to identify the resource."

-- <URL:http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7.2.1>

So if the HTTP headers say it''s text/plain, text/plain it is (unless you use
a browser that violates the HTTP 1.1 specification).

So I''m just curious how the content-type META tag got into the spec in the
first place. It seems to defy logic.



According to the HTML 4.01 specification, <meta> elements with http-equiv
attributes are designed to be parsed by the server and converted to proper
HTTP headers. In reality, this is usually impractical, and browsers
started to pay attention themselves a long time ago. As you''ve said, this
can lead to some stupid results.

"HTTP servers may use the property name specified by the http-equiv
attribute to create an [RFC822]-style header in the HTTP response."

-- <URL:http://www.w3.org/TR/html401/struct/global.html#h-7.4.4.2>

As far as the paradox of figuring out the character encoding goes, the way I
understand it is that if the HTTP headers don''t indicate the encoding, it
defaults to US-ASCII, and when it gets to the relevant <meta> element, the
browser has the option of starting again with that character encoding.
This means that as long as you use a superset of US-ASCII, it will "work".

I believe the rules for default character encodings change when you start
talking about XHTML (plus you have to throw the XML prolog into the mix).
Also remember that a browser can (reliably?) detect UTF-16 by the BOM.

You''ll probably want to read through this lot if you haven''t already:

<URL:http://ppewww.ph.gla.ac.uk/~flavell/charset/>
--
Jim Dabell


这篇关于内容类型的META标签悖论?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆