使用Python请求库无法删除网页 [英] Cant Scrape webpage with Python Requests Library
问题描述
我试图从一个网页(下面的链接)使用python中的请求获取一些信息;然而,当我通过python的请求库进行连接时,我在浏览器中看到的HTML数据似乎并不存在。没有一个xpath查询返回任何信息。我能够使用其他网站的请求,如亚马逊(下面的网站实际上由亚马逊拥有,但我似乎无法从中获取任何信息)。
I am trying to get some info from a webpage (link below) using Requests in python; however, the HTML data that I see in my browser doesn't seem to exist when I connect via python's request library. None of the xpath queries return any information. I am able to use requests for other sites such as amazon (the site below is actually owned by Amazon, but I can't seem to scrape any information from it).
url = 'http://www.myhabit.com/#page=d&dept=men&asin=B00R5TK3SS&cAsin=B00DNNZIIK&qid=aps-0QRWKNQG094M3PZKX5ST-1429238272673&sindex=0&discovery=search&ref=qd_men_sr_1_0'
user_agent = {'User-agent': 'Mozilla/5.0'}
page = requests.get(url, headers=user_agent)
tree = html.fromstring(page.text)
query = tree.xpath("//span[@id=ourPrice]/text()")
推荐答案
javascript,您可以使用 selenium 获取源代码,以便将无头浏览与 phantomjs :
The element is generated using javascript, you can use selenium to get the source, to get headless browsing combine it with phantomjs:
url = 'http://www.myhabit.com/#page=d&dept=men&asin=B00R5TK3SS&cAsin=B00DNNZIIK&qid=aps-0QRWKNQG094M3PZKX5ST-1429238272673&sindex=0&discovery=search&ref=qd_men_sr_1_0'
from selenium import webdriver
browser = webdriver.PhantomJS()
browser.get(url)
_html = browser.page_source
from bs4 import BeautifulSoup
print(BeautifulSoup(_html).find("span",{"id":"ourPrice"}).text)
$50
这篇关于使用Python请求库无法删除网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!