带有 BOM 的 UTF-8 HTML 和 CSS 文件(以及如何使用 Python 删除 BOM) [英] UTF-8 HTML and CSS files with BOM (and how to remove the BOM with Python)

查看：25 发布时间：2021/12/28 16:51:33 python file utf-8 byte-order-mark

本文介绍了带有 BOM 的 UTF-8 HTML 和 CSS 文件(以及如何使用 Python 删除 BOM)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

首先，一些背景知识:我正在使用 Python 开发一个 Web 应用程序.我的所有(文本)文件当前都以 UTF-8 格式存储，并带有 BOM.这包括我所有的 HTML 模板和 CSS 文件.这些资源作为二进制数据(BOM 和所有)存储在我的数据库中.

当我从数据库中检索模板时，我使用 template.decode('utf-8') 对它们进行解码.当 HTML 到达浏览器时，BOM 出现在 HTTP 响应正文的开头.这会在 Chrome 中产生一个非常有趣的错误:

额外的遭遇.将属性迁移回原始元素并忽略标签.

Chrome 似乎在看到 BOM 并将其误认为内容时会自动生成一个标签，从而使真正的标签出错.

那么，使用 Python，从我的 UTF-8 编码模板中删除 BOM 的最佳方法是什么(如果它存在 - 我不能保证将来会这样做)?

对于其他基于文本的文件，如 CSS，主流浏览器是否会正确解释(或忽略)BOM?它们作为没有 .decode('utf-8') 的纯二进制数据发送.

注意:我使用的是 Python 2.5.

谢谢！

解决方案

自您声明:

<块引用>

我所有的(文本)文件当前都是与 BOM 一起存储在 UTF-8 中

然后使用utf-8-sig"编解码器解码它们:

<预><代码>>>>s = u'Hello, world!'.encode('utf-8-sig')>>>秒'xefxbbxbf你好，世界！>>>s.decode('utf-8-sig')'你好，世界！

它会自动删除预期的 BOM，如果 BOM 不存在也能正常工作.

First, some background: I'm developing a web application using Python. All of my (text) files are currently stored in UTF-8 with the BOM. This includes all my HTML templates and CSS files. These resources are stored as binary data (BOM and all) in my DB.

When I retrieve the templates from the DB, I decode them using template.decode('utf-8'). When the HTML arrives in the browser, the BOM is present at the beginning of the HTTP response body. This generates a very interesting error in Chrome:

Extra <html> encountered. Migrating attributes back to the original <html> element and ignoring the tag.

Chrome seems to generate an <html> tag automatically when it sees the BOM and mistakes it for content, making the real <html> tag an error.

So, using Python, what is the best way to remove the BOM from my UTF-8 encoded templates (if it exists -- I can't guarantee this in the future)?

For other text-based files like CSS, will major browsers correctly interpret (or ignore) the BOM? They are being sent as plain binary data without .decode('utf-8').

Note: I am using Python 2.5.

Thanks!

解决方案

Since you state:

All of my (text) files are currently stored in UTF-8 with the BOM

then use the 'utf-8-sig' codec to decode them:

>>> s = u'Hello, world!'.encode('utf-8-sig')
>>> s
'xefxbbxbfHello, world!'
>>> s.decode('utf-8-sig')
u'Hello, world!'

It automatically removes the expected BOM, and works correctly if the BOM is not present as well.

这篇关于带有 BOM 的 UTF-8 HTML 和 CSS 文件(以及如何使用 Python 删除 BOM)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

带有 BOM 的 UTF-8 HTML 和 CSS 文件(以及如何使用 Python 删除 BOM) [英] UTF-8 HTML and CSS files with BOM (and how to remove the BOM with Python)

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

带有 BOM 的 UTF-8 HTML 和 CSS 文件(以及如何使用 Python 删除 BOM) [英] UTF-8 HTML and CSS files with BOM (and how to remove the BOM with Python)

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭