使用Python请求库无法删除网页 [英] Cant Scrape webpage with Python Requests Library

查看：90 发布时间：2018/6/14 18:25:57 python html xpath

本文介绍了使用Python请求库无法删除网页的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图从一个网页（下面的链接）使用python中的请求获取一些信息;然而，当我通过python的请求库进行连接时，我在浏览器中看到的HTML数据似乎并不存在。没有一个xpath查询返回任何信息。我能够使用其他网站的请求，如亚马逊（下面的网站实际上由亚马逊拥有，但我似乎无法从中获取任何信息）。

I am trying to get some info from a webpage (link below) using Requests in python; however, the HTML data that I see in my browser doesn't seem to exist when I connect via python's request library. None of the xpath queries return any information. I am able to use requests for other sites such as amazon (the site below is actually owned by Amazon, but I can't seem to scrape any information from it).

url = 'http://www.myhabit.com/#page=d&dept=men&asin=B00R5TK3SS&cAsin=B00DNNZIIK&qid=aps-0QRWKNQG094M3PZKX5ST-1429238272673&sindex=0&discovery=search&ref=qd_men_sr_1_0'
user_agent = {'User-agent': 'Mozilla/5.0'} 
page = requests.get(url, headers=user_agent)
tree = html.fromstring(page.text)
query = tree.xpath("//span[@id=ourPrice]/text()")

推荐答案

javascript，您可以使用 selenium 获取源代码，以便将无头浏览与 phantomjs ：

The element is generated using javascript, you can use selenium to get the source, to get headless browsing combine it with phantomjs:

url = 'http://www.myhabit.com/#page=d&dept=men&asin=B00R5TK3SS&cAsin=B00DNNZIIK&qid=aps-0QRWKNQG094M3PZKX5ST-1429238272673&sindex=0&discovery=search&ref=qd_men_sr_1_0'

from selenium import webdriver

browser = webdriver.PhantomJS()
browser.get(url)
_html = browser.page_source

from bs4 import BeautifulSoup

print(BeautifulSoup(_html).find("span",{"id":"ourPrice"}).text)
$50

这篇关于使用Python请求库无法删除网页的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Python请求库无法删除网页 [英] Cant Scrape webpage with Python Requests Library

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

使用Python请求库无法删除网页 [英] Cant Scrape webpage with Python Requests Library

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭