Html-Agility-Pack 未加载包含完整内容的页面? [英] Html-Agility-Pack not loading the page with full content?

查看:31
本文介绍了Html-Agility-Pack 未加载包含完整内容的页面?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Html Agility Pack 从网站获取数据(抓取)

i am using Html Agility Pack to fetch data from website(scrapping)

我的问题是我正在获取数据的网站在页面加载几秒钟后加载了一些内容.

My problem is the website from i am fetching the data is load some of the content after few seconds of page load.

所以每当我试图从特定 Div 读取特定数据时,它都会给我空值.

SO whenever i am trying to read the particular data from particular Div its giving me null.

但在 var page 我只是没有得到部门 reviewBox ..因为它尚未加载.

but in var page i just not getting the division reviewBox..becuase its not loaded yet.

public void FetchAllLinks(String Url)
{
    Url = "http://www.tripadvisor.com/";
    HtmlDocument page = new HtmlWeb().Load(Url);

    var link_list= page.DocumentNode.SelectNodes("//div[@class='reviewBox']");

    foreach (var link in link_list)
    {
        htmlpage.InnerHtml = link.InnerHtml;
    }
}

所以谁能告诉我如何延迟请求

so can anyone please tell me how to delay the request that

HtmlDocument page = new HtmlWeb().Load(Url);

将加载page变量

推荐答案

这与延迟请求无关.该节点由 JavaScript 使用 DOM 填充,而 Html Agility Pack 是满足该要求的错误工具(它根本不是 Web 引擎,它仅加载基本 Html).

It's not about delaying the request. That node is populated by javascript using the DOM and the Html Agility Pack is the wrong tool for that requirement (it isn't a web engine at all, it only loads the base Html).

当我需要获取需要完整网络引擎来解析的内容时,我通常使用 WatiN.它旨在帮助对实际网页进行单元测试,但这意味着它允许通过给定的浏览器引擎以编程方式访问网页并加载完整文档.它带有开箱即用的 IE 或 Firefox 驱动程序,我依稀记得 Chrome 也不难使用.

When I need to get at stuff that requires a full web engine to parse, I typically use WatiN. It's designed to help unit test actual web pages, but that means it allows programmatic access to web pages through a given browser engine and will load the full document. It comes with IE or Firefox drivers out of the box and I vaguely recall that Chrome wasn't hard to use, either.

这篇关于Html-Agility-Pack 未加载包含完整内容的页面?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆