使用Python requests.get解析不立即加载的html代码 [英] Using Python requests.get to parse html code that does not load at once

查看：17 发布时间：2021/12/17 13:43:19 python html web-scraping python-requests

本文介绍了使用Python requests.get解析不立即加载的html代码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试编写一个 Python 脚本，它会定期检查网站以查看某个项目是否可用.我过去曾成功地使用 requests.get、lxml.html 和 xpath 来自动化网站搜索.对于这个特定的 URL (http://www.anthropologie.com/anthro/product/4120200892474.jsp?cm_vc=SEARCH_RESULTS#/) 和同一网站上的其他人，我的代码不起作用.

I am trying to write a Python script that will periodically check a website to see if an item is available. I have used requests.get, lxml.html, and xpath successfully in the past to automate website searches. In the case of this particular URL (http://www.anthropologie.com/anthro/product/4120200892474.jsp?cm_vc=SEARCH_RESULTS#/) and others on the same website, my code was not working.

import requests
from lxml import html
page = requests.get("http://www.anthropologie.com/anthro/product/4120200892474.jsp?cm_vc=SEARCH_RESULTS#/")
tree = html.fromstring(page.text)
html_element = tree.xpath(".//div[@class='product-soldout ng-scope']")

此时， html_element 应该是一个元素列表(我认为在这种情况下只有 1 个)，但它是空的.我认为这是因为网站没有一次加载，所以当 requests.get() 出去抓取它时，它只抓取了第一部分.所以我的问题是1:我对问题的评估是否正确?和2:如果是这样，有没有办法让 requests.get() 在返回 html 之前等待，或者可能是另一个完全获取整个页面的路由.

at this point, html_element should be a list of elements (I think in this case only 1), but instead it is empty. I think this is because the website is not loading all at once, so when requests.get() goes out and grabs it, it's only grabbing the first part. So my questions are 1: Am I correct in my assessment of the problem? and 2: If so, is there a way to make requests.get() wait before returning the html, or perhaps another route entirely to get the whole page.

谢谢

感谢两位回复.我使用了 Selenium 并让我的脚本正常工作.

Thanks to both responses. I used Selenium and got my script working.

使用Python requests.get解析不立即加载的html代码 [英] Using Python requests.get to parse html code that does not load at once

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

使用Python requests.get解析不立即加载的html代码 [英] Using Python requests.get to parse html code that does not load at once

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭