在 WebClient 中加载动态生成的 HTML 代码 [英] Load dynamically generated HTML Code in WebClient

查看:37
本文介绍了在 WebClient 中加载动态生成的 HTML 代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,我正在使用 WebClient.DownloadString 来抓取网页,不幸的是 DownloadString 为我提供了没有 CSS 和 JS 更新(在页面加载时的 Internet Explorer).

Well I am using the WebClient.DownloadString in order to scrap a webpage unfortunately the DownloadString gets me the page source without the CSS and JS updates (which are made in the internet explorer while page loads).

所以我想知道如何使用 WebClient 以与 Internet Explorer 或 WebBrowser 控件相同的方式加载整个页面?(使用 css 和 js 代码注入)

So I was wondering how can I use WebClient to load the whole page the same way internet explorer or WebBrowser control does ? (with the css and js code injections)

推荐答案

所以我想知道如何使用 WebClient 以与 Internet Explorer 或 WebBrowser 控件相同的方式加载整个页面?

So I was wondering how can I use WebClient to load the whole page the same way internet explorer or WebBrowser control does ?

你不能那样做.WebClient 类用于使用 HTTP 协议下载 SINGLE 资源.它不理解 HTML 的概念.如果您需要下载此 HTML 中的相关资源,则必须使用 HTML 解析器(例如 HTML Agility Pack 例如),对于您在下载的 HTML 页面中遇到的每个 CSS 和 javascript,使用 WebClient 发送另一个 HTTP 请求以检索它.

You can't do that. The WebClient class is used to download a SINGLE resource using the HTTP protocol. It doesn't understand the concept of HTML. If you need to download associated resources in this HTML you will have to use an HTML parser (such as HTML Agility Pack for example) and for each CSS and javascript you encounter in the downloaded HTML page, send another HTTP request with the WebClient to retrieve it.

但请记住,根据您尝试抓取的网页,事情可能会变得更加复杂.例如,网页可能有 javascript,而 javascript 反过来动态引用并包含其他静态资源,例如 javascript 或 CSS.一个 WebClient,因为它不执行 javascript 可能永远不会知道它们.

But bear in mind that depending on the webpage you are trying to scrape things might get more complicated. For example the web page could have javascript which in turn dynamically references and includes other static resources such as javascript or CSS. A WebClient, since it doesn't execute javascript might never know about them.

这篇关于在 WebClient 中加载动态生成的 HTML 代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆