在HtmlAgilityPack运行脚本 [英] Running Scripts in HtmlAgilityPack

查看:745
本文介绍了在HtmlAgilityPack运行脚本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图刮掉其工作原理如下特定的网页。

I'm trying to scrape a particular webpage which works as follows.

首先在页面加载,然后运行一些JavaScript来获取它需要填充页面的数据。我感兴趣的数据。

First the page loads, then it runs some sort of javascript to fetch the data it needs to populate the page. I'm interested in that data.

如果我得到HtmlAgilityPack的页面 - 脚本不运行,所以我得到什么本质上大多是空白页

If I Get the page with HtmlAgilityPack - the script doesn't run so I get what it essentially a mostly-blank page.

有没有办法迫使它运行脚本,这样我就可以获取数据?

Is there a way to force it to run a script, so I can get the data?

推荐答案

您得到什么服务器返回 - 一样的网络浏览器。 Web浏览器,当然,然后运行该脚本。 HTML敏捷性包只是一个HTML解析器 - 它没有办法间$ P $磅的JavaScript,或者它绑定到文档的内部重新presentation。如果你想运行脚本,你将需要一个网络浏览器。完美的答案,您的问题将是一个完整的无头的网页浏览器。这是什么,包含了一个HTML解析器,一个javascript间preTER,而模拟浏览器的DOM,所有一起工作的典范。基本上,这是一个网络浏览器,除了没有它的执行部件。在这个时候没有这样的.NET环境中工作完全是一个东西。

You are getting what the server is returning - the same as a web browser. A web browser, of course, then runs the scripts. Html Agility Pack is an HTML parser only - it has no way to interpret the javascript or bind it to its internal representation of the document. If you wanted to run the script you would need a web browser. The perfect answer to your problem would be a complete "headless" web browser. That is something that incorporates an HTML parser, a javascript interpreter, and a model that simulates the browser DOM, all working together. Basically, that's a web browser, except without the rendering part of it. At this time there isn't such a thing that works entirely within the .NET environment.

您最好的选择是使用 web浏览器控制和实际加载和计划控制下运行在Internet Explorer中的页面。这会不会是快或pretty,但它会做你需要做的事情。

Your best bet is to use a WebBrowser control and actually load and run the page in Internet Explorer under programmatic control. This won't be fast or pretty, but it will do what you need to do.

另请参阅我的回答类似的问题:<一href=\"http://stackoverflow.com/questions/10886161/load-a-dom-and-execute-javascript-server-side-with-net/10886733#10886733\">Load一个DOM并执行JavaScript,服务器端,与之讨论现有的技术在.NET中做到这一点的.Net 。大部分作品,现在存在,但仅仅是还没有应用,或没有被集成在正确的方式,很遗憾。

Also see my answer to a similar question: Load a DOM and Execute javascript, server side, with .Net which discusses the available technology in .NET to do this. Most of the pieces exist right now but just aren't quite there yet or haven't been integrated in the right way, unfortunately.

这篇关于在HtmlAgilityPack运行脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆