在 WebClient 中加载动态生成的 HTML 代码 [英] Load dynamically generated HTML Code in WebClient
问题描述
好吧,我正在使用 WebClient.DownloadString
来抓取网页,不幸的是 DownloadString
为我提供了没有 CSS 和 JS 更新(在页面加载时的 Internet Explorer).
Well I am using the WebClient.DownloadString
in order to scrap a webpage unfortunately the DownloadString
gets me the page source without the CSS and JS updates (which are made in the internet explorer while page loads).
所以我想知道如何使用 WebClient 以与 Internet Explorer 或 WebBrowser
控件相同的方式加载整个页面?(使用 css 和 js 代码注入)
So I was wondering how can I use WebClient to load the whole page the same way internet explorer or WebBrowser
control does ? (with the css and js code injections)
推荐答案
所以我想知道如何使用 WebClient 以与 Internet Explorer 或 WebBrowser 控件相同的方式加载整个页面?
So I was wondering how can I use WebClient to load the whole page the same way internet explorer or WebBrowser control does ?
你不能那样做.WebClient
类用于使用 HTTP 协议下载 SINGLE 资源.它不理解 HTML 的概念.如果您需要下载此 HTML 中的相关资源,则必须使用 HTML 解析器(例如 HTML Agility Pack
例如),对于您在下载的 HTML 页面中遇到的每个 CSS 和 javascript,使用 WebClient 发送另一个 HTTP 请求以检索它.
You can't do that. The WebClient
class is used to download a SINGLE resource using the HTTP protocol. It doesn't understand the concept of HTML. If you need to download associated resources in this HTML you will have to use an HTML parser (such as HTML Agility Pack
for example) and for each CSS and javascript you encounter in the downloaded HTML page, send another HTTP request with the WebClient to retrieve it.
但请记住,根据您尝试抓取的网页,事情可能会变得更加复杂.例如,网页可能有 javascript,而 javascript 反过来动态引用并包含其他静态资源,例如 javascript 或 CSS.一个 WebClient,因为它不执行 javascript 可能永远不会知道它们.
But bear in mind that depending on the webpage you are trying to scrape things might get more complicated. For example the web page could have javascript which in turn dynamically references and includes other static resources such as javascript or CSS. A WebClient, since it doesn't execute javascript might never know about them.
这篇关于在 WebClient 中加载动态生成的 HTML 代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!