Python和BeautifulSoup编码问题 [英] Python and BeautifulSoup encoding issues

查看：267 发布时间：2016/8/5 18:53:01 python unicode utf-8 beautifulsoup

本文介绍了Python和BeautifulSoup编码问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在写使用BeautifulSoup使用Python履带，一切都进行得顺顺当当，直到我遇到了这个网站：

I'm writing a crawler with Python using BeautifulSoup, and everything was going swimmingly till I ran into this site:

http://www.elnorte.ec/

我得到的内容与要求库：

I'm getting the contents with the requests library:

r = requests.get('http://www.elnorte.ec/')
content = r.content

如果我在这一点上做内容可变的打印，所有的西班牙特殊字符似乎是工作的罚款。但是，一旦我试着喂内容变量BeautifulSoup这一切又搞砸了：

If I do a print of the content variable at that point, all the spanish special characters seem to be working fine. However, once I try to feed the content variable to BeautifulSoup it all gets messed up:

soup = BeautifulSoup(content)
print(soup)
...
<a class="blogCalendarToday" href="/component/blog_calendar/?year=2011&amp;month=08&amp;day=27&amp;modid=203" title="1009 artÃculos en este dÃa">
...

它显然错乱了所有的西班牙特殊字符（重音和诸如此类的东西）。我试着做content.de code（UTF-8），content.de code（'的Latin-1'），也试过用fromEncoding参数BeautifulSoup乱搞，将其设置为fromEncoding =UTF-8和fromEncoding ='的Latin-1，但仍然没有骰子。

It's apparently garbling up all the spanish special characters (accents and whatnot). I've tried doing content.decode('utf-8'), content.decode('latin-1'), also tried messing around with the fromEncoding parameter to BeautifulSoup, setting it to fromEncoding='utf-8' and fromEncoding='latin-1', but still no dice.

任何指针将是非常美联社preciated。

Any pointers would be much appreciated.

推荐答案

您可以尝试：

r = urllib.urlopen('http://www.elnorte.ec/')
x = BeautifulSoup.BeautifulSoup(r.read)
r.close()

print x.prettify('latin-1')

我得到正确的输出。
呵呵，在这种特殊情况下，你也可以 X .__ STR __（编码='latin1的'）。

我想这是因为它的内容是在ISO 8859（5）和META HTTP-当量内容类型不正确的说，UTF-8。

I guess this is because the content is in ISO-8859-1(5) and the meta http-equiv content-type incorrectly says "UTF-8".

你能否证实？

这篇关于Python和BeautifulSoup编码问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python和BeautifulSoup编码问题 [英] Python and BeautifulSoup encoding issues

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python和BeautifulSoup编码问题 [英] Python and BeautifulSoup encoding issues

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭