使用lxml解析xml和html页面,并在python中请求包 [英] parsing xml and html page with lxml and requests package in python
问题描述
我一直在尝试使用lxml解析xml和html页面,并在python中请求打包。为此,我使用以下代码:
I have been trying to parse xml and html page by using lxml and requests package in python. I using the following code for this purpose:
在python中:
import requests
import lxml.etree
url = ""
req = requests.get(url)
tree = html.fromstring(req.content)
root = tree.xpath('')
for item in root:
print(item.text)
此代码工作正常,但对于某些网页无法正确显示其内容,需要设置编码utf-8,但我不知道如何在此代码中添加集合编码
This code works fine but for some web pages can't show their contents properly and need to set encoding utf-8 but i don't know how i can add set encoding in this code
推荐答案
请求
自动解码服务器内容。
requests
automatically decodes content from the server.
重要提示:
r.content
- 包含尚未解码的回复内容
r.content
- contains not yet decoded response content
r.encoding
- 包含有关响应内容编码的信息
r.encoding
- contains information about response content encoding
r.text
- 根据官方文档已经解码的版本 r.content
r.text
- according to the official doc it is already decoded version of r.content
遵循unicode标准,我习惯了 r.text
,但您仍然可以使用
Following the unicode standard, I get used to r.text
but you still can decode your content manually using
r.content.decode(r.enconding)
希望有帮助。
这篇关于使用lxml解析xml和html页面,并在python中请求包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!