具有BOM的UTF-8 HTML和CSS文件(以及如何使用Python删除BOM) [英] UTF-8 HTML and CSS files with BOM (and how to remove the BOM with Python)

查看：86 发布时间：2020/7/13 2:39:46 python file utf-8 byte-order-mark

本文介绍了具有BOM的UTF-8 HTML和CSS文件(以及如何使用Python删除BOM)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

首先，有一些背景知识:我正在使用Python开发Web应用程序.我的所有(文本)文件当前都与BOM一起存储在UTF-8中.这包括我所有的HTML模板和CSS文件.这些资源作为二进制数据(BOM和所有)存储在我的数据库中.

First, some background: I'm developing a web application using Python. All of my (text) files are currently stored in UTF-8 with the BOM. This includes all my HTML templates and CSS files. These resources are stored as binary data (BOM and all) in my DB.

当我从数据库中检索模板时，我使用template.decode('utf-8')对其进行解码.当HTML到达浏览器时，BOM出现在HTTP响应正文的开头.这会在Chrome中产生一个非常有趣的错误:

When I retrieve the templates from the DB, I decode them using template.decode('utf-8'). When the HTML arrives in the browser, the BOM is present at the beginning of the HTTP response body. This generates a very interesting error in Chrome:

Extra <html> encountered. Migrating attributes back to the original <html> element and ignoring the tag.

Chrome看到BOM并将其误认为内容时，似乎会自动生成一个<html>标签，从而使真正的<html>标签成为错误.

Chrome seems to generate an <html> tag automatically when it sees the BOM and mistakes it for content, making the real <html> tag an error.

因此，使用Python，从我的UTF-8编码模板中删除BOM的最佳方法是什么(如果存在的话-我将来不能保证)?

So, using Python, what is the best way to remove the BOM from my UTF-8 encoded templates (if it exists -- I can't guarantee this in the future)?

对于其他基于文本的文件(如CSS)，主流浏览器是否可以正确解释(或忽略)BOM?它们以纯二进制数据的形式发送，而没有.decode('utf-8').

For other text-based files like CSS, will major browsers correctly interpret (or ignore) the BOM? They are being sent as plain binary data without .decode('utf-8').

注意:我正在使用Python 2.5.

Note: I am using Python 2.5.

谢谢！

推荐答案

自您声明:

我所有的(文本)文件当前都在与BOM一起存储在UTF-8中

All of my (text) files are currently stored in UTF-8 with the BOM

然后使用"utf-8-sig"编解码器对其进行解码:

then use the 'utf-8-sig' codec to decode them:

>>> s = u'Hello, world!'.encode('utf-8-sig')
>>> s
'\xef\xbb\xbfHello, world!'
>>> s.decode('utf-8-sig')
u'Hello, world!'

它会自动删除预期的BOM，并且如果该BOM也不存在，则可以正常工作.

It automatically removes the expected BOM, and works correctly if the BOM is not present as well.

这篇关于具有BOM的UTF-8 HTML和CSS文件(以及如何使用Python删除BOM)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

具有BOM的UTF-8 HTML和CSS文件(以及如何使用Python删除BOM) [英] UTF-8 HTML and CSS files with BOM (and how to remove the BOM with Python)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

具有BOM的UTF-8 HTML和CSS文件(以及如何使用Python删除BOM) [英] UTF-8 HTML and CSS files with BOM (and how to remove the BOM with Python)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭