网页爬虫 - python爬虫打印HTML问题

查看：230 发布时间：2017/9/6 4:50:46 网页爬虫 python

本文介绍了网页爬虫 - python爬虫打印HTML问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

问题

import urllib.request
import urllib.parse

page = 1
url = "http://www.qiushibaike.com/8hr/page/" + str(page)
headers = {

"User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N)"

}
request = urllib.request.Request(url, headers=headers)
response = urllib.request.urlopen(request).read()
html = response.decode("utf-8")
print(html)

运行后就报错误:
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 18194-18195: invalid continuation byte
把'utf-8'改成'GBK'也不行一样报错，这个要怎么解决？

解决方案

"User-Agent": "Mozilla/5.0 (windows 6.0)"

python3

import urllib.request
url = "http://www.qiushibaike.com/8hr/page/1"
headers = {
    #"User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N)"
    "User-Agent": "Mozilla/5.0 (windows 6.0)"
}
request = urllib.request.Request(url, headers=headers)
response = urllib.request.urlopen(request)
c = response.read()
h = c.decode('utf-8')
print(h)

这篇关于网页爬虫 - python爬虫打印HTML问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

网页爬虫 - python爬虫打印HTML问题

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

网页爬虫 - python爬虫打印HTML问题

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭