在[href]中处理HTML实体的规范 [英] Spec for handling of HTML entities in a[href]

查看:55
本文介绍了在[href]中处理HTML实体的规范的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找有关在< a> 标记的 href 属性中处理HTML实体的规范.到目前为止,还没有运气(我可能正在寻找过于具体的东西).

详细信息:

我尝试修复的错误="https://github.com/MatthewMueller/cheerio" rel ="nofollow noreferrer"> cheerio 项目.

某些实体最后不需要分号.其中之一是& curren .无论如何,当源链接到/test/example.jsp?item=123&currentSize=S&currentQty=1 时,这会导致问题.

浏览器(至少是Chrome)很好地处理.我仍然没有弄清楚为什么.

解决方案

关于HTML直至HTML 4.01(包括HTML 4.01),请参阅@Quentin的答案.

关于XHTML的任何形式,包括XHTML序列化中的HTML5,& currentSize = 包含格式正确的错误,因此该文档的任何显示都将中止(当该文档作为真正的XHTML处理时)).

在HTML序列化的HTML5中,对于中所示),在某些情况下,由于该引用未以分号终止,因此无法识别该引用./p>

具体来说,这里描述的条件是:如果字符引用作为属性的一部分被使用,并且最后匹配的字符不是;".(U + 003B)字符,下一个字符要么是"="(U + 003D)字符,要么在ASCII数字,大写ASCII字母或小写ASCII字母范围内,然后由于历史原因,所有在U + 0026 AMPERSAND字符(&)之后匹配的字符必须未使用,并且不会返回任何内容."因此,即使 foobar 是已定义的名称

,也不会在属性值中识别出& foobar =

原因是作者在属性值中编写了广泛的URL,而没有转义& ,而浏览器已经对此进行了调整.

I'm looking for a spec on handling HTML entities in the href attribute of <a> tags. So far, no luck (I might be searching for something too specific).

In detail:

The bug I'm trying to fix is part of the cheerio project.

Some entities don't require a semicolon at the end. One of them is &curren. Anyway, this leads to problems when a source links to /test/example.jsp?item=123&currentSize=S&currentQty=1.

Browsers (at least Chrome) handle this nicely. I still haven't figured out why though.

解决方案

Regarding HTML up to and including HTML 4.01, see @Quentin’s answer.

Regarding any flavor of XHTML, including HTML5 in XHTML serialization, &currentSize= contains a well-formedness error, so any display of the document is aborted (when the document is processed as truly XHTML).

In HTML5 in HTML serialization, there are tricky ad hoc rules for parsing character references. They imply that in text content, &currentSize= would be parsed as if it were written &curr;entSize=, i.e. as ¤entSize=. But within an attribute value, as in <a href="...">, then, under certain conditions, the reference is not recognized, since it is not terminated by a semicolon.

Specifically, the conditions described there are: "If the character reference is being consumed as part of an attribute, and the last character matched is not a ";" (U+003B) character, and the next character is either a "=" (U+003D) character or in the range ASCII digits, uppercase ASCII letters, or lowercase ASCII letters, then, for historical reasons, all the characters that were matched after the U+0026 AMPERSAND character (&) must be unconsumed, and nothing is returned." So no &foobar= will be recognized in an attribute value, even if foobar is a defined name

The reason is that authors have widely written URLs in attribute values without escaping & and browsers have adapted to this.

这篇关于在[href]中处理HTML实体的规范的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆