Selenium Webdriver尽管看似相同的字符串也找不到XPATH [英] Selenium Webdriver not finding XPATH despite seemingly identical strings

查看:74
本文介绍了Selenium Webdriver尽管看似相同的字符串也找不到XPATH的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题与我前两个问题有关:为特定元素诱导WebDriverWait 通过硒中的< div>刮取时的不一致. /p>

我正在从 https://www.grailed.com/.提要是运动鞋的无限滚动列表,我正在使用Selenium Webdriver抓取数据.我的问题是鞋子的图像似乎需要一段时间才能加载,因此会引发很多错误.我在图像的xpath中找到了模式.第一张图片的xpath是 /html/body/div [3]/div [6]/div [3]/div [3]/div [2]/div [2]/ div [1] /a/div [2]/img,第二个是/html/body/div [3]/div [6]/div [3]/div [3]/div [2]/div [2]/ div [ 2] /a/div [2]/img等. 遵循此线性序列,其中倒数第二个div索引每次增加1.为了解决这个问题,我在循环中添加了以下内容(仅包含相关代码).

    i = 1
    while len(sneakers) < sneaker_count:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # Get sneakers currently on page and add to sneakers list
    feed = driver.find_elements_by_class_name('feed-item')
    for item in feed:
        xpath = "/html/body/div[3]/div[6]/div[3]/div[3]/div[2]/div[2]/div[" + str(i) +   "]/a/div[2]/img"
        img = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, xpath)))
        i += 1
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

问题是,大约在第五双鞋子之后,wait语句超时,似乎在该双鞋子之后传入的xpath似乎未被识别.我使用FireFox Developer使用复制xpath功能来检查xpath,它看起来与我在打印xpath时传递的xpath相同.我使用的是带硒的ChromeDriver,但我认为这无关紧要.有谁知道为什么xpath看起来即使相同也停止被识别?

更新:因此,使用Chrome的Xpath检查器插件,它可以检测项目1-4的xpath,但通常在6之后会停止检测它们.当我检查xpath时(在Chrome和FireFox Developer模式下,xpath仍然看起来完全一样,但是当我使用"CSS和Xpath检查器"时它并没有检测到它们,但它似乎仍然没有出来.这对我来说是一个巨大的谜.

解决方案

我发现了问题. xpath很好,但是在开始的4-5个元素之后,图像被延迟加载.这意味着必须采用其他解决方案才能刮取这些图像.不是因为它们加载时间太长,而是因为它们只是在HTML中加载占位符.

This question is related to my previous two: Inducing WebDriverWait for specific elements and Inconsistency in scraping through <div>'s in Selenium.

I am scraping all of the Air Jordan sneakers off of https://www.grailed.com/. The feed is an infinitely scrolling list of sneakers and I am using Selenium webdriver to scrape the data. My problem is that the images for the shoes seem to take a while to load, so it throws a lot of errors. I have found the pattern in the xpath's of the images. The xpath to the first image is /html/body/div[3]/div[6]/div[3]/div[3]/div[2]/div[2]/div[1]/a/div[2]/img, and the second is /html/body/div[3]/div[6]/div[3]/div[3]/div[2]/div[2]/div[2]/a/div[2]/img etc. It follows this linear sequences where the second to last div index increases by one each time. To handle this I put the following in my loop (only relevant code is included).

    i = 1
    while len(sneakers) < sneaker_count:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # Get sneakers currently on page and add to sneakers list
    feed = driver.find_elements_by_class_name('feed-item')
    for item in feed:
        xpath = "/html/body/div[3]/div[6]/div[3]/div[3]/div[2]/div[2]/div[" + str(i) +   "]/a/div[2]/img"
        img = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, xpath)))
        i += 1
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

The issue is, after about the 5th pair of shoes, the wait statement times out, it seems that the xpath passed in after that pair of shoes is not recognized. I used FireFox Developer to check the xpath using the copy xpath feature, and it seems identical to the passed in xpath when I print it. I use ChromeDriver w/Selenium but I don't think that's relevant. Does anyone know why the xpath's stop being recognized even though they seem identical?

UPDATE: So using an Xpath checker add-on to Chrome, it detects xpaths for items 1-4, but often stops detecting them after 6. When I check the xpath (both on Chrome and FireFox Developer mode, the xpath still looks identical, but it doesn't detect them when I use the "CSS and Xpath checker" it still doesn't seem to come out. This is a huge mystery to me.

解决方案

I found the problem. The xpath was fine, but after the first 4-5 elements, the images are lazy-loaded. This means that a different solution must be reached in order to scrape these images. It's not that they take too long to load, it's that they just load placeholders in the HTML.

这篇关于Selenium Webdriver尽管看似相同的字符串也找不到XPATH的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆