硒需要很多时间来获取给定URL的动态页面 [英] Selenium takes lots of time to get dynamic page of given URL

查看:137
本文介绍了硒需要很多时间来获取给定URL的动态页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Java 中的项目。
在这个项目中,我必须使用DOM。
为此,我首先加载任何给定URL的动态页面,方法是使用Selenium。
然后我使用Jsoup解析它们。

I am doing a Project in Java. In this project I have to work with DOM. For that I first load a dynamic page of any given URL, by using Selenium. Then I parse them using Jsoup.

我想获得给定网址的动态页面源代码

代码快照:

public static void main(String[] args) throws IOException {

     // Selenium
     WebDriver driver = new FirefoxDriver();
     driver.get("ANY URL HERE");  
     String html_content = driver.getPageSource();
     driver.close();

     // Jsoup makes DOM here by parsing HTML content
     Document doc = Jsoup.parse(html_content);

     // OPERATIONS USING DOM TREE
}

但是问题是,硒占整个处理时间的95%左右,这是不可取的。

But the problem is, Selenium takes around 95% of the whole processing time, that is undesirable.

Selenium首先打开Firefox,然后加载给定页面,然后获取动态页面源代码。

Selenium first opens Firefox, then loads the given page, then gets the dynamic page source code.

你能告诉我如何减少Selenium花费的时间,把这个工具换成另一个有效的工具。任何其他建议也将受到欢迎。

Can you tell me how I can reduce the time taken by Selenium, by replacing this tool with another efficient tool. Any other advice would also be welcome.

编辑NO。 1

这个链接

FirefoxProfile profile = new FirefoxProfile();
profile.setPreference("general.useragent.override", "some UA string");
WebDriver driver = new FirefoxDriver(profile);

但是这里的第二行是什么,我不明白。由于文献硒也很差。

But what is second line here, I didn't understand. As Documentation is also very poor of selenium.

编辑2

System.out.println(Fetching%s ...+ url1);
System.out.println(Fetching%s ...+ url2);

System.out.println("Fetching %s..." + url1); System.out.println("Fetching %s..." + url2);

    WebDriver driver = new FirefoxDriver(createFirefoxProfile());

    driver.get("url1");  
    String hml1 = driver.getPageSource();

    driver.get("url2");
    String hml2 = driver.getPageSource();
    driver.close();

    Document doc1 = Jsoup.parse(hml1);
    Document doc2 = Jsoup.parse(hml2);


推荐答案

尝试这样:

public static void main(String[] args) throws IOException {

    // Selenium
    WebDriver driver = new FirefoxDriver(createFirefoxProfile());
    driver.get("ANY URL HERE");
    String html_content = driver.getPageSource();
    driver.close();

    // Jsoup makes DOM here by parsing HTML content
    // OPERATIONS USING DOM TREE
}

private static FirefoxProfile createFirefoxProfile() {
    File profileDir = new File("/tmp/firefox-profile-dir");
    if (profileDir.exists())
        return new FirefoxProfile(profileDir);
    FirefoxProfile firefoxProfile = new FirefoxProfile();
    File dir = firefoxProfile.layoutOnDisk();
    try {
        profileDir.mkdirs();
        FileUtils.copyDirectory(dir, profileDir);
    } catch (IOException e) {
        e.printStackTrace();
    }
    return firefoxProfile;
}

如果不存在,createFireFoxProfile()方法将创建一个配置文件。如果配置文件已经存在,则使用它。所以硒不需要每次都创建profile-dir结构。

The createFireFoxProfile() method creates a profile if one doesn't exist. It uses if a profile already exists. So selenium doesn't need to create the profile-dir structure each and every time.

这篇关于硒需要很多时间来获取给定URL的动态页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆