硒需要很多时间来获取给定URL的动态页面 [英] Selenium takes lots of time to get dynamic page of given URL

查看：137 发布时间：2017/6/25 0:45:08 java dom selenium selenium-webdriver jsoup

本文介绍了硒需要很多时间来获取给定URL的动态页面的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Java 中的项目。
在这个项目中，我必须使用DOM。
为此，我首先加载任何给定URL的动态页面，方法是使用Selenium。
然后我使用Jsoup解析它们。

I am doing a Project in Java. In this project I have to work with DOM. For that I first load a dynamic page of any given URL, by using Selenium. Then I parse them using Jsoup.

我想获得给定网址的动态页面源代码

代码快照：

public static void main(String[] args) throws IOException {

     // Selenium
     WebDriver driver = new FirefoxDriver();
     driver.get("ANY URL HERE");  
     String html_content = driver.getPageSource();
     driver.close();

     // Jsoup makes DOM here by parsing HTML content
     Document doc = Jsoup.parse(html_content);

     // OPERATIONS USING DOM TREE
}

但是问题是，硒占整个处理时间的95％左右，这是不可取的。

But the problem is, Selenium takes around 95% of the whole processing time, that is undesirable.

Selenium首先打开Firefox，然后加载给定页面，然后获取动态页面源代码。

Selenium first opens Firefox, then loads the given page, then gets the dynamic page source code.

你能告诉我如何减少Selenium花费的时间，把这个工具换成另一个有效的工具。任何其他建议也将受到欢迎。

Can you tell me how I can reduce the time taken by Selenium, by replacing this tool with another efficient tool. Any other advice would also be welcome.

编辑NO。 1

这个链接。

FirefoxProfile profile = new FirefoxProfile();
profile.setPreference("general.useragent.override", "some UA string");
WebDriver driver = new FirefoxDriver(profile);

但是这里的第二行是什么，我不明白。由于文献硒也很差。

But what is second line here, I didn't understand. As Documentation is also very poor of selenium.

编辑2

System.out.println（Fetching％s ...+ url1）;
System.out.println（Fetching％s ...+ url2）;

System.out.println("Fetching %s..." + url1); System.out.println("Fetching %s..." + url2);

    WebDriver driver = new FirefoxDriver(createFirefoxProfile());

    driver.get("url1");  
    String hml1 = driver.getPageSource();

    driver.get("url2");
    String hml2 = driver.getPageSource();
    driver.close();

    Document doc1 = Jsoup.parse(hml1);
    Document doc2 = Jsoup.parse(hml2);

推荐答案

尝试这样：

public static void main(String[] args) throws IOException {

    // Selenium
    WebDriver driver = new FirefoxDriver(createFirefoxProfile());
    driver.get("ANY URL HERE");
    String html_content = driver.getPageSource();
    driver.close();

    // Jsoup makes DOM here by parsing HTML content
    // OPERATIONS USING DOM TREE
}

private static FirefoxProfile createFirefoxProfile() {
    File profileDir = new File("/tmp/firefox-profile-dir");
    if (profileDir.exists())
        return new FirefoxProfile(profileDir);
    FirefoxProfile firefoxProfile = new FirefoxProfile();
    File dir = firefoxProfile.layoutOnDisk();
    try {
        profileDir.mkdirs();
        FileUtils.copyDirectory(dir, profileDir);
    } catch (IOException e) {
        e.printStackTrace();
    }
    return firefoxProfile;
}

如果不存在，createFireFoxProfile（）方法将创建一个配置文件。如果配置文件已经存在，则使用它。所以硒不需要每次都创建profile-dir结构。

The createFireFoxProfile() method creates a profile if one doesn't exist. It uses if a profile already exists. So selenium doesn't need to create the profile-dir structure each and every time.

这篇关于硒需要很多时间来获取给定URL的动态页面的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

硒需要很多时间来获取给定URL的动态页面 [英] Selenium takes lots of time to get dynamic page of given URL

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

硒需要很多时间来获取给定URL的动态页面 [英] Selenium takes lots of time to get dynamic page of given URL

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭