让Jsoup支持JavaScript动态生成的html [英] Getting Jsoup to support dynamically generated html by JavaScript

查看:87
本文介绍了让Jsoup支持JavaScript动态生成的html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在我正在开发一个webcrawler。这一个应该解析一些特定的网站,并给我一个XML文件输出。到目前为止,这没有问题。 Crawler的工作原理可以通过cfg文件快速定制。我使用Jsoup来解析HTML内容。



我刚刚添加了一些网站,并注意到我通过JavaScript创建了一个HTML内容的大问题。没有办法让Jsoup支持Javascript?或者至少得到我在浏览器中看到的完整的HTML内容。



我已经尝试了HtmlUnit,但是这个并不好。它并没有给我我的浏览器内容。



此外,



Ogofo

解决方案

Jsoup不支持JavaScript,它不会模拟浏览器。如果您打算执行Javascript,请忘记它。根据我的经验,HtmlUnit是一个无头浏览器,它给了我最好的结果(总是谈论Java框架)。

有一件值得在HtmlUnit中试用的东西正在改变 BrowserVersion (Chrome / InternetEplorer / FireFox),同时创建 WebClient 实例。有些网站以不同的方式作出反应,有时

right now I'm working on a webcrawler. This one should parse some specific sites and give me an output into an xml-file. Up to this point, it's no problem. The Crawler works and you can customize it realy quickly via a cfg-file. I use Jsoup to parse the HTML-content.

I just added a few more sites and noticed that I got a huge problem with HTML-content that is created via JavaScript. Isn't there a way to make Jsoup supporting Javascript? Or at least get the full HTML-content I can see in my browser.

I already tried HtmlUnit, but this one didn't do well. It did not give me the content I would get in my browser.

Sincerly,

Ogofo

解决方案

Jsoup does not support javascript and it does not emulate a browser. Just forget about it if you're planning to execute Javascript. In my experience HtmlUnit, which is a headless browser, has given me the best results (always talking about Java frameworks).

One thing that worths trying in HtmlUnit is changing the BrowserVersion (Chrome / InternetEplorer / FireFox) while creating the WebClient instance. Some sites react in a different way and sometimes just changing that value might give you the results you expect to get.

这篇关于让Jsoup支持JavaScript动态生成的html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆