python使用beautifulsoup乱码问题

查看：271 发布时间：2017/9/6 11:20:15 beautifulsoup python

本文介绍了python使用beautifulsoup乱码问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

问题

使用BeautifulSoup中的find_all方法输出中文乱码，为ASCII码，但输出是一个对象，不能使用decode()和encode()，不知该如何转换
代码如下：

#coding:utf-8

import urllib2
from sgmllib import SGMLParser
from bs4 import BeautifulSoup
import re
import sys  

reload(sys)  
sys.setdefaultencoding('utf8')   
soup=BeautifulSoup(open('test.html'),"lxml")
s=soup.find_all("title")
print s

输出如下：

[<title>\n    \u91cd\u5e86 - \u5728\u7ebf\u8d2d\u7968&amp;\u5f71\u8baf\n</title>]
[Finished in 0.7s]

当使用decode()方法时，报错如下：

Traceback (most recent call last):
  File "G:\Work\code\python\3.py", line 13, in <module>
    print s.decode()
AttributeError: 'ResultSet' object has no attribute 'decode'
[Finished in 0.8s with exit code 1]

在官方文档中也有这么一句话：

如果传入字节码参数,Beautiful Soup会当作UTF-8编码,可以传入一段Unicode 编码来避免Beautiful Soup解析编码出错

小白一枚，实在不知该如何该如何传入一段Unicode 编码来避免Beautiful Soup解析编码出错，希望有大神告知，感激不敬！

解决方案

这个跟beautifulsoup没关系，而是跟输出列表有关系，python列表里面不能输出汉字，只能单独输出，另外你的并不是乱码，结果是正确的，只是unicode的编码。改成这样试试:

s=soup.find_all("title")
print s[0].encode('utf-8')

这篇关于python使用beautifulsoup乱码问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

python使用beautifulsoup乱码问题

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

python使用beautifulsoup乱码问题

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭