使用 Python 请求提取 href URL [英] Extracting href URL with Python Requests

查看:58
本文介绍了使用 Python 请求提取 href URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 python 中的请求包从 xpath 中提取 URL.我可以得到文本,但我没有尝试给出 URL.有人可以帮忙吗?

ipdb>网页.xpath(xpath_url + '/text()')['URL 文本']ipdb>网页.xpath(xpath_url + '/a()')*** lxml.etree.XPathEvalError: 无效的表达式ipdb>网页.xpath(xpath_url + '/href()')*** lxml.etree.XPathEvalError: 无效的表达式ipdb>网页.xpath(xpath_url + '/url()')*** lxml.etree.XPathEvalError: 无效的表达式

我使用本教程开始:http://docs.python-guide.org/en/latest/scenarios/scrape/

看起来应该很容易,但在我的搜索过程中什么也没有出现.

谢谢.

解决方案

你试过webpage.xpath(xpath_url + '/@href')吗?

完整代码如下:

from lxml import html进口请求page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')网页 = html.fromstring(page.content)网页.xpath('//a/@href')

结果应该是:

<预><代码>['http://econpy.pythonanywhere.com/ex/002.html','http://econpy.pythonanywhere.com/ex/003.html','http://econpy.pythonanywhere.com/ex/004.html','http://econpy.pythonanywhere.com/ex/005.html']

I would like to extract the URL from an xpath using the requests package in python. I can get the text but nothing I try gives the URL. Can anyone help?

ipdb> webpage.xpath(xpath_url + '/text()')
['Text of the URL']
ipdb> webpage.xpath(xpath_url + '/a()')
*** lxml.etree.XPathEvalError: Invalid expression
ipdb> webpage.xpath(xpath_url + '/href()')
*** lxml.etree.XPathEvalError: Invalid expression
ipdb> webpage.xpath(xpath_url + '/url()')
*** lxml.etree.XPathEvalError: Invalid expression

I used this tutorial to get started: http://docs.python-guide.org/en/latest/scenarios/scrape/

It seems like it should be easy, but nothing comes up during my searching.

Thank you.

解决方案

Have you tried webpage.xpath(xpath_url + '/@href')?

Here is the full code:

from lxml import html
import requests

page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
webpage = html.fromstring(page.content)

webpage.xpath('//a/@href')

The result should be:

[
  'http://econpy.pythonanywhere.com/ex/002.html',
  'http://econpy.pythonanywhere.com/ex/003.html', 
  'http://econpy.pythonanywhere.com/ex/004.html',
  'http://econpy.pythonanywhere.com/ex/005.html'
]

这篇关于使用 Python 请求提取 href URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆