具有BOM的UTF-8 HTML和CSS文件(以及如何使用Python删除BOM) [英] UTF-8 HTML and CSS files with BOM (and how to remove the BOM with Python)

查看:86
本文介绍了具有BOM的UTF-8 HTML和CSS文件(以及如何使用Python删除BOM)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,有一些背景知识:我正在使用Python开发Web应用程序.我的所有(文本)文件当前都与BOM一起存储在UTF-8中.这包括我所有的HTML模板和CSS文件.这些资源作为二进制数据(BOM和所有)存储在我的数据库中.

First, some background: I'm developing a web application using Python. All of my (text) files are currently stored in UTF-8 with the BOM. This includes all my HTML templates and CSS files. These resources are stored as binary data (BOM and all) in my DB.

当我从数据库中检索模板时,我使用template.decode('utf-8')对其进行解码.当HTML到达浏览器时,BOM出现在HTTP响应正文的开头.这会在Chrome中产生一个非常有趣的错误:

When I retrieve the templates from the DB, I decode them using template.decode('utf-8'). When the HTML arrives in the browser, the BOM is present at the beginning of the HTTP response body. This generates a very interesting error in Chrome:

Extra <html> encountered. Migrating attributes back to the original <html> element and ignoring the tag.

Chrome看到BOM并将其误认为内容时,似乎会自动生成一个<html>标签,从而使真正的<html>标签成为错误.

Chrome seems to generate an <html> tag automatically when it sees the BOM and mistakes it for content, making the real <html> tag an error.

因此,使用Python,从我的UTF-8编码模板中删除BOM的最佳方法是什么(如果存在的话-我将来不能保证)?

So, using Python, what is the best way to remove the BOM from my UTF-8 encoded templates (if it exists -- I can't guarantee this in the future)?

对于其他基于文本的文件(如CSS),主流浏览器是否可以正确解释(或忽略)BOM?它们以纯二进制数据的形式发送,而没有.decode('utf-8').

For other text-based files like CSS, will major browsers correctly interpret (or ignore) the BOM? They are being sent as plain binary data without .decode('utf-8').

注意:我正在使用Python 2.5.

Note: I am using Python 2.5.

谢谢!

推荐答案

自您声明:

我所有的(文本)文件当前都在 与BOM一起存储在UTF-8中

All of my (text) files are currently stored in UTF-8 with the BOM

然后使用"utf-8-sig"编解码器对其进行解码:

then use the 'utf-8-sig' codec to decode them:

>>> s = u'Hello, world!'.encode('utf-8-sig')
>>> s
'\xef\xbb\xbfHello, world!'
>>> s.decode('utf-8-sig')
u'Hello, world!'

它会自动删除预期的BOM,并且如果该BOM也不存在,则可以正常工作.

It automatically removes the expected BOM, and works correctly if the BOM is not present as well.

这篇关于具有BOM的UTF-8 HTML和CSS文件(以及如何使用Python删除BOM)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆