HTML的敏捷,包不加载全内容的页面? [英] Html-Agility-Pack not loading the page with full content?

查看:130
本文介绍了HTML的敏捷,包不加载全内容的页面?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的Html敏捷性包来从网站的数据(报废)

i am using Html Agility Pack to fetch data from website(scrapping)

我的问题是从我所获取数据的网站加载某些内容的页面加载的几秒钟后。

My problem is the website from i am fetching the data is load some of the content after few seconds of page load.

所以每当我试图读取特定事业部的特定数据的给我空。

SO whenever i am trying to read the particular data from particular Div its giving me null.

VAR页我只是没有得到该部门 reviewBox ..监守它尚未加载。

but in var page i just not getting the division reviewBox..becuase its not loaded yet.

public void FetchAllLinks(String Url)
{
    Url = "http://www.tripadvisor.com/";
    HtmlDocument page = new HtmlWeb().Load(Url);

    var link_list= page.DocumentNode.SelectNodes("//div[@class='reviewBox']");

    foreach (var link in link_list)
    {
        htmlpage.InnerHtml = link.InnerHtml;
    }
}

所以任何人都可以请告诉我如何延缓了

so can anyone please tell me how to delay the request that

HtmlDocument page = new HtmlWeb().Load(Url);

将加载完整数据 varibale

推荐答案

这不是拖延的请求。该节点通过使用DOM和HTML敏捷性包的javascript填充是对于要求错误的工具(它不是一个Web引擎在所有,它仅装载基HTML)。

It's not about delaying the request. That node is populated by javascript using the DOM and the Html Agility Pack is the wrong tool for that requirement (it isn't a web engine at all, it only loads the base Html).

当我需要的东西,需要一个完整的Web引擎来分析,我通常使用华廷。它的目的是帮助单元测试实际的网页,但这就意味着它可以通过给定的浏览器引擎编程访问网页,将加载完整的文档。它自带的IE或Firefox司机开箱,我依稀记得,铬是不是很难使用,无论是。

When I need to get at stuff that requires a full web engine to parse, I typically use WatiN. It's designed to help unit test actual web pages, but that means it allows programmatic access to web pages through a given browser engine and will load the full document. It comes with IE or Firefox drivers out of the box and I vaguely recall that Chrome wasn't hard to use, either.

这篇关于HTML的敏捷,包不加载全内容的页面?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆