硒火StaleElementReferenceException [英] selenium fire StaleElementReferenceException

查看:84
本文介绍了硒火StaleElementReferenceException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试用硒制成网络爬虫. 我的程序触发了StaleElementReferenceException. 我以为那是因为我以递归方式抓取页面,并且当页面没有更多链接时,该功能将导航到下一页,而不是先前的父页面.

i try to make a web crawler with selenium. My program fire a StaleElementReferenceException. I thought that were because i crawl a page recursive and when a page have no more links the function navigate to next page and not previously to the parent page.

因此,当当前URL与父URL不相等时,我引入了一个树数据结构来导航回父URL.但这不是解决我问题的方法.

Therefore i have introduced a tree data structure to navigate back to the parent when the current url not equal the parent url. But this was not the solution for my problem.

有人可以帮助我吗?

代码:

public class crawler {
    private static FirefoxDriver driver;
    private static String main_url = "https://robhammond.co/tools/seo-crawler";
    private static List<String> uniqueLinks = new ArrayList<String>();

    public static void main(String[] args) {
        driver = new FirefoxDriver();

        Node<String> root = new Node<>(main_url);

        scrap(root, main_url);
    }

    public static void scrap(Node<String> node, String url) {
        if(node.getParent() != null && (!driver.getCurrentUrl().equals(node.getParent().getData()))) {
            driver.navigate().to(node.getParent().getData());
        }

        driver.navigate().to(url);

        List<WebElement> allLinks = driver.findElements(By.tagName("a"));

        for(WebElement link : allLinks) {
            if(link.getAttribute("href").contains(main_url) && !uniqueLinks.contains(link.getAttribute("href")) && link.isDisplayed()) {
                uniqueLinks.add(link.getAttribute("href"));

                System.out.println(link.getAttribute("href"));

                scrap(new Node<>(link.getAttribute("href")), link.getAttribute("href"));
            }
        }
    }
}

这是控制台的输出:

D:\Programme\openjdk-12.0.1_windows-x64_bin\jdk-12.0.1\bin\java.exe "-javaagent:D:\Programme\JetBrains\IntelliJ IDEA 2019.1.2\lib\idea_rt.jar=60461:D:\Programme\JetBrains\IntelliJ IDEA 2019.1.2\bin" -Dfile.encoding=UTF-8 -classpath C:\Users\admin\Desktop\SeleniumWebScraper\out\production\SeleniumWebScraper;D:\Downloads\selenium-server-standalone-3.141.59.jar de.company.crawler.crawler
1557924446770   mozrunner::runner   INFO    Running command: "C:\\Program Files\\Mozilla Firefox\\firefox.exe" "-marionette" "-foreground" "-no-remote" "-profile" "C:\\Users\\admin\\AppData\\Local\\Temp\\rust_mozprofile.YqmEqE8y1pjv"
1557924447037   addons.webextension.screenshots@mozilla.org WARN    Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: mozillaAddons
1557924447037   addons.webextension.screenshots@mozilla.org WARN    Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: resource://pdf.js/
1557924447037   addons.webextension.screenshots@mozilla.org WARN    Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: about:reader*
1557924448047   Marionette  INFO    Listening on port 60468
1557924448383   Marionette  WARN    TLS certificate errors will be ignored for this session
Mai 15, 2019 2:47:28 NACHM. org.openqa.selenium.remote.ProtocolHandshake createSession
INFO: Detected dialect: W3C
JavaScript warning: https://robhammond.co/js/jquery.min.js, line 4: Using //@ to indicate sourceMappingURL pragmas is deprecated. Use //# instead
https://robhammond.co/tools/seo-crawler#content
https://twitter.com/intent/tweet?text=SEO%20Crawler&url=https://robhammond.co/tools/seo-crawler&via=robhammond
Exception in thread "main" org.openqa.selenium.StaleElementReferenceException: The element reference of <a href="/tools/"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed
For documentation on this error, please visit: https://www.seleniumhq.org/exceptions/stale_element_reference.html
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:25:53'
System info: host: 'DESKTOP-admin', ip: '192.168.233.1', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '12.0.1'
Driver info: org.openqa.selenium.firefox.FirefoxDriver
Capabilities {acceptInsecureCerts: true, browserName: firefox, browserVersion: 66.0.5, javascriptEnabled: true, moz:accessibilityChecks: false, moz:geckodriverVersion: 0.24.0, moz:headless: false, moz:processID: 19124, moz:profile: C:\Users\admin\AppData\Loca..., moz:shutdownTimeout: 60000, moz:useNonSpecCompliantPointerOrigin: false, moz:webdriverClick: true, pageLoadStrategy: normal, platform: WINDOWS, platformName: WINDOWS, platformVersion: 10.0, rotatable: false, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify}
Session ID: b3b87675-57c8-4b48-9a20-8df5e4d37503
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
    at org.openqa.selenium.remote.http.W3CHttpResponseCodec.createException(W3CHttpResponseCodec.java:187)
    at org.openqa.selenium.remote.http.W3CHttpResponseCodec.decode(W3CHttpResponseCodec.java:122)
    at org.openqa.selenium.remote.http.W3CHttpResponseCodec.decode(W3CHttpResponseCodec.java:49)
    at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:158)
    at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83)
    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552)
    at org.openqa.selenium.remote.RemoteWebElement.execute(RemoteWebElement.java:285)
    at org.openqa.selenium.remote.RemoteWebElement.getAttribute(RemoteWebElement.java:134)
    at de.company.crawler.crawler.scrap(crawler.java:33)
    at de.company.crawler.crawler.scrap(crawler.java:38)
    at de.company.crawler.crawler.main(crawler.java:20)

Process finished with exit code 1

推荐答案

  1. 当您离开首页浏览时,所有allLinks列表中的"nofollow noreferrer> WebElements 丢失了.

  1. When you navigate away from the first page all WebElements in the allLinks list get lost.

我建议将其从WebElement列表转换为普通字符串,例如:

I would recommend converting it from the list of WebElement to the list of normal Strings like:

List<String> allLinksHrefs = allLinks.stream().map(link -> link.getAttribute("href")).collect(Collectors.toList());

并遍历此新的allLinksHrefs列表.

这篇关于硒火StaleElementReferenceException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆