使用 Python 请求提取 href URL [英] Extracting href URL with Python Requests
本文介绍了使用 Python 请求提取 href URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想使用 python 中的请求包从 xpath 中提取 URL.我可以得到文本,但我没有尝试给出 URL.有人可以帮忙吗?
ipdb>网页.xpath(xpath_url + '/text()')['URL 文本']ipdb>网页.xpath(xpath_url + '/a()')*** lxml.etree.XPathEvalError: 无效的表达式ipdb>网页.xpath(xpath_url + '/href()')*** lxml.etree.XPathEvalError: 无效的表达式ipdb>网页.xpath(xpath_url + '/url()')*** lxml.etree.XPathEvalError: 无效的表达式
我使用本教程开始:http://docs.python-guide.org/en/latest/scenarios/scrape/
看起来应该很容易,但在我的搜索过程中什么也没有出现.
谢谢.
解决方案
你试过webpage.xpath(xpath_url + '/@href')
吗?
完整代码如下:
from lxml import html进口请求page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')网页 = html.fromstring(page.content)网页.xpath('//a/@href')
结果应该是:
<预><代码>['http://econpy.pythonanywhere.com/ex/002.html','http://econpy.pythonanywhere.com/ex/003.html','http://econpy.pythonanywhere.com/ex/004.html','http://econpy.pythonanywhere.com/ex/005.html']I would like to extract the URL from an xpath using the requests package in python. I can get the text but nothing I try gives the URL. Can anyone help?
ipdb> webpage.xpath(xpath_url + '/text()')
['Text of the URL']
ipdb> webpage.xpath(xpath_url + '/a()')
*** lxml.etree.XPathEvalError: Invalid expression
ipdb> webpage.xpath(xpath_url + '/href()')
*** lxml.etree.XPathEvalError: Invalid expression
ipdb> webpage.xpath(xpath_url + '/url()')
*** lxml.etree.XPathEvalError: Invalid expression
I used this tutorial to get started: http://docs.python-guide.org/en/latest/scenarios/scrape/
It seems like it should be easy, but nothing comes up during my searching.
Thank you.
解决方案
Have you tried webpage.xpath(xpath_url + '/@href')
?
Here is the full code:
from lxml import html
import requests
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
webpage = html.fromstring(page.content)
webpage.xpath('//a/@href')
The result should be:
[
'http://econpy.pythonanywhere.com/ex/002.html',
'http://econpy.pythonanywhere.com/ex/003.html',
'http://econpy.pythonanywhere.com/ex/004.html',
'http://econpy.pythonanywhere.com/ex/005.html'
]
这篇关于使用 Python 请求提取 href URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文