我怎样才能让Firebug与HtmlUnitDriver的pageSource报告相匹配? [英] How can I get Firebug to match HtmlUnitDriver's pageSource report?

查看:112
本文介绍了我怎样才能让Firebug与HtmlUnitDriver的pageSource报告相匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Selenium Library中使用Java来刮取网页。当我在Firefox的页面上使用Firebug时,我可以看到该页面的源代码包含以下HTML结构:

 < div> ; 
< div>
< table>
< caption />
< thead />
< tbody />
< / table>
< / div>
< / div>

然而,当我使用HtmlUnitDriver编程下载页面的源代码时,请使用driver.getPageSource()请参阅相应的HTML结构已更改为:

 < div> 
< table>
< caption />
< tbody />
< / table>
< / div>




  1. 为什么HtmlUnitDriver的报告与Firebug给出的报告不同? li>
  2. 我可以设置firebug,以便根据HtmlUnitDriver如何报告它来检查HTML结构吗?




p>我假设第二个包装< div> < thead> 该页面。



您可以通过禁用JavaScript来检查,例如通过 about:config 和设置 javascript.enabled false 或者通过像NoScript或Ghostery这样的附加组件。


I'm using Java with the Selenium Library to scrape a webpage. When I use Firebug on the page in Firefox, I can see that the page's source contains the following HTML structure:

<div>
    <div>
        <table>
            <caption />
            <thead />
            <tbody />
        </table>
    </div>
</div>

However, when I programatically download the page's source using HtmlUnitDriver, then use driver.getPageSource(), I see that the corresponding HTML structure has changed to:

<div>
    <table>
        <caption />
        <tbody />
    </table>
</div>

  1. Why does the HtmlUnitDriver's report differ to that given by Firebug?
  2. Can I set up firebug so that I can inspect the HTML structure according to how the HtmlUnitDriver will report it?

解决方案

Note that Firebug does not adjust the HTML structure that way, i.e. the integrated developer tools should show you the same.

I assume the second wrapping <div> and the <thead> get added by some JavaScript running on the page.

You can check that by disabling JavaScript, e.g. by going to about:config and setting javascript.enabled to false or via an add-on like NoScript or Ghostery.

这篇关于我怎样才能让Firebug与HtmlUnitDriver的pageSource报告相匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆