为什么是“​”被注入我的HTML? [英] Why is "​" being injected into my HTML?

查看:1654
本文介绍了为什么是“​”被注入我的HTML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑:您可以在此处看到问题(来源)。



EDIT2:有趣的是,源不是不是的问题。只有使用控制台(Firebug)。



我在一个名为 test.html 的文件中有以下标记:

 <!DOCTYPE html> 
< html>
< head>
< title>测试线束< / title>
< link href ='/ css / main.css'rel ='stylesheet'type ='text / css'/>
< / head>
< body>
< h3>测试线束< / h3>
< / body>
< / html>

但在Chrome中,我看到:

 <!DOCTYPE html> 
< html>
< head>
< / head>
< body>
&#8203;



< title>测试线束< / title>
< link href ='/ css / main.css'rel ='stylesheet'type ='text / css'/>
< h3>测试线束< / h3>
< / body>
< / html>

看起来像&#802 是零宽度的空间,但是是什么原因呢?我正在使用具有UTF-8编码的Sublime Text 2和使用Jinja2的Google App Engine(但是Jinja只是加载 test.html )。任何想法?



提前感谢

解决方案

问题来源。您提供的实例将以以下字节开始(即,它们出现在<!DOCTYPE html> ):0xE2 0x80 0x8B。这可以看出。在显示格式下选择十六进制,使用Rex Swain的 HTTP查看器。另请注意,验证使用W3C标记验证器的页面提供的信息表明在文档开头有一些非常错误,特别是消息第1行,第1列:发现没有看到一个doctype的空格字符。



验证器和Chrome工具中会发生什么?在Firebug中 - 是将字节0xE2 0x80 0x8B作为字符数据,隐含地启动 body 元素(因为字符数据无法有效地出现在头部元素或之前),意味着一个空的元素。



解决方案当然是删除那些字节。浏览器通常会忽略它们,但您不应该依赖此类错误处理,并且字节阻止了有用的HTML验证。如何删除它们以及它们如何到达首先取决于您的创作环境。



由于页面被声明(在HTTP头文件中)为UTF- 8编码,这些字节表示零宽度空间(U + 200B)字符。它没有可见的字形,没有宽度,所以即使浏览器将它视为 body 元素开头的数据,您也不会在视觉呈现中发现任何内容。符号&#8203; 是一个字符引用,大概由浏览器工具用来指示是否存在一个通常不可见的字符。



生成HTML文档的软件有可能插入 ZERO WIDTH NO-BREAK SPACE (U + FEFF)。这将是有效的,因为通过特殊约定,UTF-8编码数据可以从该字符开始,也称为字节顺序标记( BOM )出现在数据开始时。使用U + 200B而不是U + FEFF听起来像软件不太可能出现的错误,但如果人们想到Unicode的字符名称,那么人类可能会被误认为是这样。

EDIT: You can see the issue here (look in source).

EDIT2: Interesting, it is not an issue in source. Only with the console (Firebug as well).

I have the following markup in a file called test.html:

​<!DOCTYPE html>
<html>
<head>
    <title>Test Harness</title>
    <link href='/css/main.css' rel='stylesheet' type='text/css' />
</head>
<body>
    <h3>Test Harness</h3>
</body>
</html>

But in Chrome, I see:

<!DOCTYPE html>
<html>
<head>
</head>
<body>
    "&#8203;


        "
    <title>Test Harness</title>
    <link href='/css/main.css' rel='stylesheet' type='text/css' />
    <h3>Test Harness</h3>
</body>
</html>

It looks like &#802 is a zero width space, but what is causing it? I am using Sublime Text 2 with UTF-8 encoding and Google App Engine with Jinja2 (but Jinja is simply loading test.html). Any thoughts?

Thanks in advance.

解决方案

It is an issue in the source. The live example that you provided starts with the following bytes (i.e., they appear before <!DOCTYPE html>): 0xE2 0x80 0x8B. This can be seen e.g. using Rex Swain’s HTTP Viewer by selecting "Hex" under "Display Format". Also note that validating the page with the W3C Markup Validator gives information that suggests that there is something very wrong at the start of the document, especially the message "Line 1, Column 1: Non-space characters found without seeing a doctype first."

What happens in the validator and in the Chrome tools – as well as e.g. in Firebug – is that the bytes 0xE2 0x80 0x8B are taken as character data, which implicitly starts the body element (since character data cannot validly appear in the head element or before it), implying an empty head element before it.

The solution, of course, is to remove those bytes. Browsers usually ignore them, but you should not rely on such error handling, and the bytes prevent useful HTML validation. How you remove them, and how they got there in the first place, depends on your authoring environment.

Since the page is declared (in HTTP headers) as being UTF-8 encoded, those bytes represent the ZERO WIDTH SPACE (U+200B) character. It has no visible glyph and no width, so you won’t notice anything in the visual presentation even though browsers treat it as being data at the start of the body element. The notation &#8203; is a character reference for it, presumably used by browser tools to indicate the presence of a normally invisible character.

It is possible that the software that produced the HTML document was meant to insert ZERO WIDTH NO-BREAK SPACE (U+FEFF) instead. That would have been valid, since by a special convention, UTF-8 encoded data may start with this character, also known as byte order mark (BOM) when appearing at the start of data. Using U+200B instead of U+FEFF sounds like an error that software is unlikely to make, but human beings may be mistaken that way if they think of the Unicode names of the characters.

这篇关于为什么是“&amp;#8203;”被注入我的HTML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆