如何使用Selenium Webdriver和Python抓取所有搜索结果 [英] How to scrape all the search results using Selenium webdriver and Python

查看:260
本文介绍了如何使用Selenium Webdriver和Python抓取所有搜索结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从此站点的搜索结果中抓取所有CRD# https://brokercheck.finra.org/search/genericsearch/list

I'm trying to scrape all CRD# from the search result from this site https://brokercheck.finra.org/search/genericsearch/list

(单击链接时,您需要重做搜索,只需在Individual搜索中键入一些随机内容即可)

(You'll need to redo the search when you click on the link, just type some random stuff for the Individual search)

我正在使用driver.find_elements_by_xpath定位每个结果页面上的所有CRD编号.但是,我已经在路径上玩了一段时间了,但是Webdriver仍然无法从站点获取CRD.

I'm using driver.find_elements_by_xpath to target all CRD numbers on each result page. However, I've been playing around with the paths for a while but the webdriver still can't pick up the CRDs from the site.

我目前(在Python中)

I currently have (in Python)

crds = driver.find_elements_by_xpath("//md-list-item/div/div/div/div/div/bc-bio-geo-section/div/div/div/div/div/span")

但是结果始终为空.

推荐答案

要从网站 https://brokercheck.finra.org/search/genericsearch/grid 使用定位器策略:

To print all the CRD# from the search results within the website https://brokercheck.finra.org/search/genericsearch/grid using Selenium you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • 使用CSS_SELECTORget_attribute():

print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "span.ng-binding[ng-bind-html='vm.item.id']")))])

  • 使用 XPATH text :

    print([my_elem.text for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[starts-with(., 'CRD')]//following-sibling::span[1]")))])
    

  • 注意:您必须添加以下导入:

  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

  • 这篇关于如何使用Selenium Webdriver和Python抓取所有搜索结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆