防止BeautifulSoup的renderContents()更改& nbsp;到 [英] Prevent BeautifulSoup's renderContents() from changing &nbsp; to Â

查看：149 发布时间：2020/9/20 8:00:18 python python-3.x utf-8 beautifulsoup

本文介绍了防止BeautifulSoup的renderContents()更改& nbsp;到的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 bs4 做一些工作在某些文本上，但在某些情况下会将 字符转换为Â.我能说的最好的是，这是一个从UTF-8到latin1的编码不匹配(或相反?)

I'm using bs4 to do some work on some text, but in some cases it converts   characters to Â. The best I can tell is that this is an encoding mismatch from UTF-8 to latin1 (or reverse?)

我的网络应用程序中的所有内容都是UTF-8，Python3是UTF-8，并且我已经确认数据库是UTF-8.

Everything in my web app is UTF-8, Python3 is UTF-8, and I've confirmed the database is UTF-8.

我已将问题缩小到这一行:

I've narrowed down the problem to this one line:

print("Before soup: " + text)  # Before soup: &nbsp;
soup = BeautifulSoup(text, "html.parser")
#.... do stuff to soup, but all commented out for this testing.
soup = BeautifulSoup(soup.renderContents(), "html.parser")  # <---- PROBLEM!
print(soup.renderContents())  # b'\xc3\x82\xc2\xa0'
print("After SOUP: " + str(soup))  # After SOUP: Â

如何防止renderContents()更改编码?没有没有文档关于此功能！

How do I prevent renderContents() from changing the encoding? There is no documentation on this function!

进一步研究文档后，这似乎是是关键，但我仍然无法解决问题！

Upon further research into the docs, this seems to be the key, but I still can't fix the problem!

print(soup.prettify(formatter="html"))  # &Acirc;&nbsp;

防止BeautifulSoup的renderContents()更改& nbsp;到 [英] Prevent BeautifulSoup's renderContents() from changing &nbsp; to Â

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

防止BeautifulSoup的renderContents()更改&amp; nbsp;到 [英] Prevent BeautifulSoup&#39;s renderContents() from changing &amp;nbsp; to &#194;

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

防止BeautifulSoup的renderContents()更改& nbsp;到 [英] Prevent BeautifulSoup's renderContents() from changing   to Â

登录关闭