使用 Selenium 抓取 iframe [英] Scraping iframe using Selenium

查看:84
本文介绍了使用 Selenium 抓取 iframe的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想抓取网站中的广告,但其中很多都是动态的,而且是 DOM 对象.例如在这个片段

I want to scrape ads in websites but many of them are dynamic and they are DOM objects. For example in this snippet

我可以通过 Selenium 获取 iframe 标签,但我不能再进一步了.我认为这是因为 XPATH.在这种情况下,iframe 内 的 XPATH 是 /html,与主页 相同.

I can get the iframe tag by Selenium but I cannot go any further. I think it is because of the XPATH. In this case the XPATH of the <html> inside the iframe is /html which is the same as the main page <html>.

这是使用的代码行:

element = WebDriverWait(self.driver,20).until(EC.presence_of_all_elements_located((By.XPATH, '/html')))

有什么建议吗?

推荐答案

默认情况下, selenium.webdriver 对象设置为它已解析的默认页面.要获取 iframe 数据,您必须切换到给定的 iframe.

By default the selenium.webdriver object is set to the default page which it has parsed. To get the iframe data you will have to switch to the given iframe.

driver = webdriver.Chrome(executable_path=path_chrome)

# find the frame using id, title etc.
frame = driver.find_elements_by_xpath("//iframe[@title='iframe_to_get']")

# switch the webdriver object to the iframe.
driver.switch_to.frame(frame[i])

永远记住,如果迭代 iframe,然后切换回到默认网页.否则,您将无法在同一代码中切换到其他 iframe.

Always remember, if iterating over the iframes then to SWITCH BACK to the default webpage. Otherwise you won't be able to switch to other iframes in same code.

driver.switch_to.default_content()

更新

下面提到的功能现已弃用.所以我更新了答案.

Update

Below mentioned functions are deprecated now. So i have updated the answer.

driver.switch_to_frame('Any frame') #deprecated
driver.switch_to_default_content() #deprecated

这篇关于使用 Selenium 抓取 iframe的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆