Selenium-driver.getPageSource()与从浏览器查看的源不同 [英] Selenium - driver.getPageSource() differs than the source viewed from browser

查看:742
本文介绍了Selenium-driver.getPageSource()与从浏览器查看的源不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Selenium从指定的URL捕获的源代码到HTML文件中,但是我不知道为什么,我没有从浏览器中看到确切的源代码.

I am trying to capture the source code from the URL specified into an HTML file using selenium, but I don't know why, I am not getting the exact source code which we see from the browser.

下面是我的Java代码,用于捕获HTML文件中的源代码

Below is my java code to capture the source in an HTML file

private static void getHTMLSourceFromURL(String url, String fileName) {

    WebDriver driver = new FirefoxDriver();
    driver.get(url);

    try {
        Thread.sleep(5000);   //the page gets loaded completely

        List<String> pageSource = new ArrayList<String>(Arrays.asList(driver.getPageSource().split("\n")));

        writeTextToFile(pageSource, originalFile);

    } catch (InterruptedException e) {
        e.printStackTrace();
    }

    System.out.println("quitting webdriver");
    driver.quit();
}

/**
 * creates file with fileName and writes the content
 * 
 * @param content
 * @param fileName
 */
private static void writeTextToFile(List<String> content, String fileName) {
    PrintWriter pw = null;
    String outputFolder = ".";
    File output = null;
    try {
        File dir = new File(outputFolder + '/' + "HTML Sources");
        if (!dir.exists()) {
            boolean success = dir.mkdirs();
            if (success == false) {
                try {
                    throw new Exception(dir + " could not be created");
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }

        output = new File(dir + "/" + fileName);
        if (!output.exists()) {
            try {
                output.createNewFile();
            } catch (IOException ioe) {
                ioe.printStackTrace();
            }
        }
        pw = new PrintWriter(new FileWriter(output, true));
        for (String line : content) {
            pw.print(line);
            pw.print("\n");
        }
    } catch (IOException ioe) {
        ioe.printStackTrace();
    } finally {
        pw.close();
    }

}

有人可以对此有所解释吗? WebDriver如何呈现页面?浏览器如何显示源?

Can someone throw some light into this as to why this happens? How WebDriver renders the page? And how browser shows the source?

推荐答案

从Selenium获得的源"代码似乎根本不是源.它似乎是当前DOM的HTML.您在浏览器中看到的源代码是服务器给定的HTML,然后是JavaScript对其进行的任何动态更改.如果DOM发生了根本变化,那么浏览器的源代码将不会反映出这些变化,而Selenium会反映出这些变化.如果要在浏览器中查看当前的DOM,请使用开发人员工具,而不是源代码.

The "source" code you get from Selenium seems to not be the source at all. It seems to be the HTML for the current DOM. The source code you see in the browser is the HTML as given by the server, before any dynamic changes made to it by JavaScript. If the DOM changes at all, the browser source code doesn't reflect those changes, but Selenium will. If you want to see the current DOM in a browser, you'd use the developer tools, not the source code.

这篇关于Selenium-driver.getPageSource()与从浏览器查看的源不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆