如何使用带有 Python 的 Selenium 的 webdriver 检查网页的内容是否已更改? [英] How to check if a web page's content has been changed using Selenium's webdriver with Python?

查看:75
本文介绍了如何使用带有 Python 的 Selenium 的 webdriver 检查网页的内容是否已更改?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以 20 秒的时间间隔比较 old_page_source 和 new_page_source 对我来说是不成功的.

Comparing old_page_source with new_page_source at time intervals of 20 seconds has been unsuccessful for me.

# using google chrome as my browser
driver = webdriver.Chrome('chromedriverfilepath')

# 5 trials to see how often page gets updated. Currently unsuccesful
for x in range(1, 5):
    # the webpage being analyzed
    driver.get("www.somewebsite.com")

    old_page_source = driver.page_source

    print time.strftime("\n\nTRIAL %d" % x + " ,first page fetched at time...." + 'Time: %H:%M:%S')

    driver.get("www.somewebsite.com")
    new_page_source = driver.page_source

    # keep checking every 20 seconds until page is updated/changed
    while old_page_source == new_page_source:
        sleep(20)
        driver.get("www.somewebsite.com")
        new_page_source = driver.page_source

print "page was changed at time.... " + time.strftime('Time: %H:%M:%S')

推荐答案

你不能依赖 page_source 来做你正在做的事情.Selenium 将报告的内容很可能是浏览器首先收到的内容.正如文档提到:

You cannot rely on page_source for what you are doing. What Selenium will report is most likely going to be what the browser first received. As the docs mention:

获取上次加载页面的来源.如果页面在加载后被修改(例如,通过 Javascript),则无法保证返回的文本是修改后的页面的文本.请查阅正在使用的特定驱动程序的文档以确定是否返回的文本反映了页面的当前状态或 Web 服务器上次发送的文本.返回的页面源是底层 DOM 的表示:不要期望它以与从 Web 服务器发送的响应相同的方式进行格式化或转义.将其视为艺术家的印象.

Get the source of the last loaded page. If the page has been modified after loading (for example, by Javascript) there is no guarantee that the returned text is that of the modified page. Please consult the documentation of the particular driver being used to determine whether the returned text reflects the current state of the page or the text last sent by the web server. The page source returned is a representation of the underlying DOM: do not expect it to be formatted or escaped in the same way as the response sent from the web server. Think of it as an artist's impression.

(重点是我的.文档是针对 Java 绑定的,但行为不是由 Java 绑定决定的,而是由位于浏览器端的 Selenium 部分决定的.所以这也适用于 Python 绑定.)

(Emphasis mine. The doc is for the Java bindings but the behavior is not determined by the Java bindings but by the part of Selenium that lives browser-side. So this applies to the Python bindings too.)

要获得页面的实际状态,您应该做的是:

What you should be doing to get the actual state of the page is:

driver.execute_script("return document.documentElement.outerHTML")

这将为您提供整个页面的 DOM 树的序列化.

This will give you a serialization of the DOM tree of the entire page.

这篇关于如何使用带有 Python 的 Selenium 的 webdriver 检查网页的内容是否已更改?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆