不需要的替代HTML实体由BeautifulSoup [英] Unwanted replacement of html entities by BeautifulSoup

查看：202 发布时间：2016/8/5 19:17:34 html utf-8 python-2.7 beautifulsoup html-entities

本文介绍了不需要的替代HTML实体由BeautifulSoup的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一些包含HTML MML，我使用MathType的Word文档生成。我有一个使用BeautifulSoup至prettify它一个python脚本，但问题是它需要像＆放大器;＃x2220; 键，把它变成实际的字节序列 0xE2均为0x88 0XA0 这是＆＃x2220;符号。这是一个问题，因为 0xE2均为0x88 0XA0 将没有显示＆＃x2220;在浏览器中。相反，浏览器间$ P $其中pts它作为一个系列的拉丁字符。这与所有的数学实体为好，如与三角洲发生的事情; ＆昂; ＆减去;与加; ...等

I have some html containing mml that I am generating from Word documents using MathType. I have a python script that uses BeautifulSoup to prettify it, but the problem is it takes something like ∠ and turns it into the actual byte sequence 0xE2 0x88 0xA0 which is the ∠ symbol. This is a problem because 0xE2 0x88 0xA0 won't display as ∠ in the browser. Instead the browser interprets it as a series of latin characters. This is happening with all the math entities as well, such as Δ ∠ − +... etc.

我从BeautifulSoup文件看，我可以看到如何把实体成字节序列，但我没有使用该命令;所有我使用的是prettify（）。而且我没有看到BeautifulSoup文档的方式不把实体为字节序列。

I looked through the BeautifulSoup documentation and I can see how to turn entities into the byte sequences, but I'm not using that command; all I'm using is prettify(). And I didn't see a way in the BeautifulSoup documentation to not turn entities into byte sequences.

有谁知道，如果有一个在BeautifulSoup的设置来告诉它不改变实体的字节序列？我希望如此，因为它似乎有点哑必须撤消prettify运行后的损害：）

Does anyone know if there's a setting in BeautifulSoup to tell it not to change entities to byte sequences? I hope so because it seems kind of dumb to have to undo the damage after prettify runs :)

在此先感谢您的帮助！

不需要的替代HTML实体由BeautifulSoup [英] Unwanted replacement of html entities by BeautifulSoup

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

不需要的替代HTML实体由BeautifulSoup [英] Unwanted replacement of html entities by BeautifulSoup

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭