为什么我必须为我的 <script> 指定字符集属性?标签? [英] Why must I specify charset attributes for my <script> tags?
问题描述
我有一点奇怪的情况:
- 主 HTML 页面以 UTF-16 字符集提供(由于此问题的某些要求超出范围)
- HTML 页面使用
标签来加载外部脚本(即它们具有
src
属性) - 那些外部脚本是 US-ASCII/UTF-8
- Web 服务器正在为内容类型为application/javascript"的脚本提供服务.没有字符集提示
- 脚本没有字节顺序标记 (BOM)
- Main HTML page is served in UTF-16 character set (due to some requirements out-of-scope for this question)
- HTML page uses
<script>
tags to load external scripts (i.e. they havesrc
attributes) - Those external scripts are in US-ASCII/UTF-8
- The web server is serving the scripts with the content-type "application/javascript" with no character set hints
- The scripts have no byte-order-mark (BOM)
加载上述页面时,Firefox 和 Chrome(当前版本)都会抛出错误,指出脚本文件的第一个字符无效.
When loading the page described above, both Firefox and Chrome (current versions) throw errors saying that the first character of the script files are invalid.
查看网络"相应开发工具视图的选项卡显示文件很好(它们在预览器中呈现很好).
Looking at the "Network" tabs of the respective dev-tools views shows the files are just fine (they render in the previewer just fine).
我的结论是浏览器对整个页面"的编码应该是什么感到困惑.或一些类似的愚蠢.
My conclusion was that the browsers are becoming confused as to what the encoding should be for "the whole page" or some similar foolishness.
所以我尝试将 charset=UTF-8"
属性添加到 标签,这似乎解决了问题.
So I tried adding a charset="UTF-8"
attribute to the <script>
tags and that seems to solve the problem.
但我真的不应该这样做,对吗?
But I really shouldn't have to do that, should I?
首先,服务器告诉客户端文档的类型是什么.它是 application/javascript
并且没有指定字符集.(确实,RFC 说charset
仅适用于<代码>文本/*代码> MIME 类型).好的,我可以理解为什么会有一些歧义.
First of all, the server is telling the client what the document's type is. It's application/javascript
and doesn't specify a character set. (Indeed, the RFC says that charset
is only applicable to text/*
MIME-types). Okay, I can understand why there might be some ambiguity, there.
但是文档类型是 javascript,对于如何处理您不知道其实际字符集的 javascript 文件,有一些明显的规则.例如,如果它有 BOM,则使用它.如果没有任何 BOM,那么区分 UTF-16 和 UTF-8 应该很容易.(请注意,在这些相同的页面上加载 CSS 文件似乎没有任何问题,这些文件也与脚本处于相同的情况.)
But the document-type is javascript, and there are some obvious rules for how to handle a javascript file whose actual charset you don't know. For example, if it's got a BOM, then use it. If there isn't any BOM, it should be really easy to tell UTF-16 from UTF-8. (Note that there doesn't seem to be any problem on these same pages with loading CSS files, which are also in the same situation as the scripts.)
最后,封闭页面不应该知道其依赖项的编码是什么.事实上,它可能不可能知道并明确指定charset
然后将页面与其依赖项紧密耦合,反之亦然.
Lastly, the enclosing page shouldn't have to know what the encoding of its dependencies are. In fact, it might be impossible for it to know, and explicitly-specifying the charset
then tightly-couples the page to its dependencies and vice-versa.
有没有办法让浏览器在不指定页面本身的 charset
的情况下正确检测这些依赖项的字符集?
Is there a way to get the browser to correctly-detect the character set of these dependencies without specifying the charset
in the page itself?
推荐答案
在文件中没有 BOM,或者在 或
中没有明确的
为文件,文件的编码不明确.浏览器可能采用 UTF-8(并且应该根据 RFC 4329),但如果脚本包含任何实际上未以 UTF-8 编码的非 ASCII 字符,则文件将无法正确处理.charset
code>Content-Type
Without a BOM in the file, or an explicit charset
in the <script>
or Content-Type
for the file, the encoding of the file is ambiguous. The browser might assume UTF-8 (and should, per RFC 4329), but if the script contains any non-ASCII characters that are not actually encoded in UTF-8, the file won't process properly.
但是,HTML 5 第 4.11 节规定 的回退编码是文档的编码,如果
没有
>charset
属性.如果没有 BOM 或 charset
来指定文件的实际编码,则回退生效.
However, HTML 5 Section 4.11 dictates that a <script>
's fallback encoding is the document's encoding if the <script>
does not have a charset
attribute. The fallback takes effect if there is no BOM or charset
to specify the file's actual encoding.
因此,请确保您的 HTML 和 JS 文件始终使用相同的编码,否则您必须以一种或另一种方式明确说明 JS 文件的charset
.
So, either make sure your HTML and JS files are always using the same encoding, or else you have to be explicit about the JS file's charset
, one way or the other.
这篇关于为什么我必须为我的 <script> 指定字符集属性?标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!