硒需要很多时间来获取给定URL的动态页面 [英] Selenium takes lots of time to get dynamic page of given URL
问题描述
我正在使用Java 中的项目。
在这个项目中,我必须使用DOM。
为此,我首先加载任何给定URL的动态页面,方法是使用Selenium。
然后我使用Jsoup解析它们。
I am doing a Project in Java. In this project I have to work with DOM. For that I first load a dynamic page of any given URL, by using Selenium. Then I parse them using Jsoup.
我想获得给定网址的动态页面源代码
代码快照:
public static void main(String[] args) throws IOException {
// Selenium
WebDriver driver = new FirefoxDriver();
driver.get("ANY URL HERE");
String html_content = driver.getPageSource();
driver.close();
// Jsoup makes DOM here by parsing HTML content
Document doc = Jsoup.parse(html_content);
// OPERATIONS USING DOM TREE
}
但是问题是,硒占整个处理时间的95%左右,这是不可取的。
But the problem is, Selenium takes around 95% of the whole processing time, that is undesirable.
Selenium首先打开Firefox,然后加载给定页面,然后获取动态页面源代码。
Selenium first opens Firefox, then loads the given page, then gets the dynamic page source code.
你能告诉我如何减少Selenium花费的时间,把这个工具换成另一个有效的工具。任何其他建议也将受到欢迎。
Can you tell me how I can reduce the time taken by Selenium, by replacing this tool with another efficient tool. Any other advice would also be welcome.
编辑NO。 1
这个链接。
FirefoxProfile profile = new FirefoxProfile();
profile.setPreference("general.useragent.override", "some UA string");
WebDriver driver = new FirefoxDriver(profile);
但是这里的第二行是什么,我不明白。由于文献硒也很差。
But what is second line here, I didn't understand. As Documentation is also very poor of selenium.
编辑2
System.out.println(Fetching%s ...+ url1);
System.out.println(Fetching%s ...+ url2);
System.out.println("Fetching %s..." + url1); System.out.println("Fetching %s..." + url2);
WebDriver driver = new FirefoxDriver(createFirefoxProfile());
driver.get("url1");
String hml1 = driver.getPageSource();
driver.get("url2");
String hml2 = driver.getPageSource();
driver.close();
Document doc1 = Jsoup.parse(hml1);
Document doc2 = Jsoup.parse(hml2);
推荐答案
尝试这样:
public static void main(String[] args) throws IOException {
// Selenium
WebDriver driver = new FirefoxDriver(createFirefoxProfile());
driver.get("ANY URL HERE");
String html_content = driver.getPageSource();
driver.close();
// Jsoup makes DOM here by parsing HTML content
// OPERATIONS USING DOM TREE
}
private static FirefoxProfile createFirefoxProfile() {
File profileDir = new File("/tmp/firefox-profile-dir");
if (profileDir.exists())
return new FirefoxProfile(profileDir);
FirefoxProfile firefoxProfile = new FirefoxProfile();
File dir = firefoxProfile.layoutOnDisk();
try {
profileDir.mkdirs();
FileUtils.copyDirectory(dir, profileDir);
} catch (IOException e) {
e.printStackTrace();
}
return firefoxProfile;
}
如果不存在,createFireFoxProfile()方法将创建一个配置文件。如果配置文件已经存在,则使用它。所以硒不需要每次都创建profile-dir结构。
The createFireFoxProfile() method creates a profile if one doesn't exist. It uses if a profile already exists. So selenium doesn't need to create the profile-dir structure each and every time.
这篇关于硒需要很多时间来获取给定URL的动态页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!