Webdriver/Selenium:当没有类名,ID或CSS选择器时如何查找元素? [英] Webdriver/Selenium: How to find element when it has no class name, id, or css selecector?

查看:526
本文介绍了Webdriver/Selenium:当没有类名,ID或CSS选择器时如何查找元素?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每个"7件装"搜索结果此处包含右侧右侧每个条目的地址和电话号码,因此:

我想分别提取(i)地址和(ii)电话号码.问题是,这是在HTML中定义这些元素的方式:

<div style="width:146px;float:left;color:#808080;line-height:18px"><span>Houston, TX</span><br><span>United States</span><br><nobr><span>(713) 766-6663</span></nobr></div>

因此,没有可以使用find_element_by *()的类名,css选择器或ID,我不知道链接文本,因此无法使用find_element_by_partial_link_text(),并且WebDriver不提供据我所知,这是一种按风格查找的方法.我们如何解决这个问题?我需要能够每次搜索结果,不同查询每次都可靠地提取正确的数据.

绑定到WebDriver的语言是Python.

解决方案

您至少可以依赖两个关键的内容:具有id="lclbox"的容器框和具有class="intrlu"的元素,它们对应于每个结果项目.

从每个结果项中提取地址和电话号码的方法可能有所不同,这是一个选项(肯定不是很漂亮),涉及通过每个span元素文本的正则表达式检查来定位电话号码:

import re

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver


driver = webdriver.Chrome()
driver.get('https://www.google.com/?gws_rd=ssl#q=plumbers%2Bhouston%2Btx')

# waiting for results to load
wait = WebDriverWait(driver, 10)
box = wait.until(EC.visibility_of_element_located((By.ID, "lclbox")))

phone_re = re.compile(r"\(\d{3}\) \d{3}-\d{4}")

for result in box.find_elements_by_class_name("intrlu"):
    for span in result.find_elements_by_tag_name("span"):
        if phone_re.search(span.text):
            parent = span.find_element_by_xpath("../..")
            print parent.text
            break
    print "-----"

我很确定它可以改进,但是希望它可以为您提供一个起点.打印:

Houston, TX
(713) 812-7070
-----
Houston, TX
(713) 472-5554
-----
6646 Satsuma Dr
Houston, TX
(713) 896-9700
-----
1420 N Durham Dr
Houston, TX
(713) 868-9907
-----
5630 Edgemoor Dr
Houston, TX
(713) 665-5890
-----
5403 Kirby Dr
Houston, TX
(713) 224-3747
-----
Houston, TX
(713) 385-0349
-----

Each of the "7-pack" search results here contains an address and a phone number for each entry down the right hand side thus:

For each, I want to extract (i) the address and (ii) the phone number. The problem is, here is how these elements are defined in HTML:

<div style="width:146px;float:left;color:#808080;line-height:18px"><span>Houston, TX</span><br><span>United States</span><br><nobr><span>(713) 766-6663</span></nobr></div>

So there is no class name, css selector, or id from which I can use a find_element_by*(), I won't know the link text, so I can't use find_element_by_partial_link_text(), and WebDriver does not provide a method for finding by style, as far as I am aware. How do we work around this? I need to reliably be able to extract the right data every time, for each search result, for varying queries.

Language binding to WebDriver is Python.

解决方案

There are at least two key things you can rely on: the container box with id="lclbox" and elements with class="intrlu" corresponding to each result item.

How to extract the address and a phone number from each result item can vary, here is one option (definitely, not beautiful) involving locating the phone number via regex check of each span element text:

import re

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver


driver = webdriver.Chrome()
driver.get('https://www.google.com/?gws_rd=ssl#q=plumbers%2Bhouston%2Btx')

# waiting for results to load
wait = WebDriverWait(driver, 10)
box = wait.until(EC.visibility_of_element_located((By.ID, "lclbox")))

phone_re = re.compile(r"\(\d{3}\) \d{3}-\d{4}")

for result in box.find_elements_by_class_name("intrlu"):
    for span in result.find_elements_by_tag_name("span"):
        if phone_re.search(span.text):
            parent = span.find_element_by_xpath("../..")
            print parent.text
            break
    print "-----"

I'm pretty sure it can be improved, but hope it would give you a starting point. Prints:

Houston, TX
(713) 812-7070
-----
Houston, TX
(713) 472-5554
-----
6646 Satsuma Dr
Houston, TX
(713) 896-9700
-----
1420 N Durham Dr
Houston, TX
(713) 868-9907
-----
5630 Edgemoor Dr
Houston, TX
(713) 665-5890
-----
5403 Kirby Dr
Houston, TX
(713) 224-3747
-----
Houston, TX
(713) 385-0349
-----

这篇关于Webdriver/Selenium:当没有类名,ID或CSS选择器时如何查找元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆