python - 爬虫获取网站数据，出现乱码怎么解决。

查看：297 发布时间：2017/9/6 0:52:23

本文介绍了python - 爬虫获取网站数据，出现乱码怎么解决。的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

问题

#!/usr/bin/python
# -*- coding: utf-8 -*-
import urllib2
import re
import HTMLParser

class WALLSTREET:
    def __init__(self, baseUrl):
        self.url = baseUrl
    def get_html_content(self):
        url = self.url
        response = urllib2.urlopen(url)
        str = response.read()
        print str
baseUrl="https://wallstreetcn.com/live/global" #华尔街见文url
ws = WALLSTREET(baseUrl)
ws.get_html_content()

以上是代码，写的很简单，但是print出来的是乱码
尝试了 print str.decode(utf-8)
但是报错
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte

解决方案

str = response.read()这句有两个问题：
1、str是内置关键字必须更改为其他变量名
2、查看网页源代码的编码方式，如果为utf-8在read()后加.decode('utf-8')，若为其他可以相应解码

小建议这种小程序写个函数会比用类来更加方便，无论是使用还是实现

这篇关于python - 爬虫获取网站数据，出现乱码怎么解决。的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

python - 爬虫获取网站数据，出现乱码怎么解决。

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

python - 爬虫获取网站数据，出现乱码怎么解决。

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭