刮除结果与检查的DOM元素不同 [英] Scraping result is different from inspected DOM element

查看:87
本文介绍了刮除结果与检查的DOM元素不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Python中的Selenium webdriver解析网页中的价格列表.因此,我尝试使用此代码获取所有DOM元素

I want to parse list of price in a web page using Selenium webdriver in Python. So, I try to fetch all the DOM elements using this code

url = 'https://www.google.com/flights/explore/#explore;f=BDO;t=r-Asia-0x88d9b427c383bc81%253A0xb947211a2643e5ac;li=0;lx=2;d=2018-01-09'
driver = webdriver.Chrome()
driver.get(url)

print(driver.page_source)

问题是我从page_source获得的内容与在检查的元素中看到的内容不同

The problem is what I got from page_source is different from what I see in the inspected element

<div class="CTPFVNB-f-a">
    <div class="CTPFVNB-f-c"></div>
    <div class="CTPFVNB-f-d elt="toolbelt"></div>
    <div class="CTPFVNB-f-e" elt="result">Here is the difference</div>
</div>

差异存在于CTPFVNB-f-e类内部.在检查的DOM元素中,此标记保存了我要获取的所有价格.但是,由于page_source的结果,这部分丢失了.

The difference exist inside the CTPFVNB-f-e class. In the inspected DOM element, this tag hold all the prices that I want to fetch. But, in the result of page_source, this part is missing.

谁能告诉我我的代码出了什么问题?还是我需要进一步的步骤来解析价格列表?

Could anyone tell me what is wrong with my code? Or do I need further steps to parse the list of prices?

推荐答案

页面加载后,JavaScript正在修改页面.打开页面后立即打印页面源代码时,无需执行JavaScript,即可获取初始代码.

JavaScript is modifying the page after the page loads. As you are printing page source immediately after opening the page, you're getting the initial code without the execution of JavaScript.

您可以执行以下任一操作:

You can do any one of the following things:

  • 添加延迟:使用time.sleep(x)(根据您的要求更改x的值.以秒为单位)(建议)
  • 隐式等待: driver.implicitly_wait(x)(同样,x与上面相同)
  • 明确等待::等待HTML元素出现,然后获取页面源.要了解如何执行此操作,请参考此链接. (高度推荐)
  • Add delay: Using time.sleep(x) (change value of x according to your requirements. it is in seconds) (NOT recommended)
  • Implicit wait: driver.implicitly_wait(x) (again x is same as above)
  • Explicit wait: Wait for the HTML element to appear and then get the page source. To learn how to do this, refer this link. (HIGHLY recommended)

使用显式等待是此处的更好的选择,因为它仅等待元素可见所需的时间.因此不会造成任何额外的延迟.或者,如果页面加载速度比预期的慢,您将无法通过隐式等待获得所需的输出.

Using explicit wait is the better option here as it waits only for the time required for the element to become visible. Thus won't cause any excess delays. Or if the page loads slower than expected, you won't get the desired output using implicit wait.

这篇关于刮除结果与检查的DOM元素不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆