使用Python请求库无法删除网页 [英] Cant Scrape webpage with Python Requests Library

查看:90
本文介绍了使用Python请求库无法删除网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从一个网页(下面的链接)使用python中的请求获取一些信息;然而,当我通过python的请求库进行连接时,我在浏览器中看到的HTML数据似乎并不存在。没有一个xpath查询返回任何信息。我能够使用其他网站的请求,如亚马逊(下面的网站实际上由亚马逊拥有,但我似乎无法从中获取任何信息)。

I am trying to get some info from a webpage (link below) using Requests in python; however, the HTML data that I see in my browser doesn't seem to exist when I connect via python's request library. None of the xpath queries return any information. I am able to use requests for other sites such as amazon (the site below is actually owned by Amazon, but I can't seem to scrape any information from it).

url = 'http://www.myhabit.com/#page=d&dept=men&asin=B00R5TK3SS&cAsin=B00DNNZIIK&qid=aps-0QRWKNQG094M3PZKX5ST-1429238272673&sindex=0&discovery=search&ref=qd_men_sr_1_0'
user_agent = {'User-agent': 'Mozilla/5.0'} 
page = requests.get(url, headers=user_agent)
tree = html.fromstring(page.text)
query = tree.xpath("//span[@id=ourPrice]/text()")


推荐答案

javascript,您可以使用 selenium 获取源代码,以便将无头浏览与 phantomjs

The element is generated using javascript, you can use selenium to get the source, to get headless browsing combine it with phantomjs:

url = 'http://www.myhabit.com/#page=d&dept=men&asin=B00R5TK3SS&cAsin=B00DNNZIIK&qid=aps-0QRWKNQG094M3PZKX5ST-1429238272673&sindex=0&discovery=search&ref=qd_men_sr_1_0'

from selenium import webdriver

browser = webdriver.PhantomJS()
browser.get(url)
_html = browser.page_source

from bs4 import BeautifulSoup

print(BeautifulSoup(_html).find("span",{"id":"ourPrice"}).text)
$50

这篇关于使用Python请求库无法删除网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆