使用CefSharp.Offscreen检索需要Java渲染的网页 [英] Using CefSharp.Offscreen to retrieve a web page that requires Javascript to render

查看:236
本文介绍了使用CefSharp.Offscreen检索需要Java渲染的网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的任务,但是它将需要精通CefSharp的人来解决。

I have what is hopefully a simple task, but it's going to take someone that's versed in CefSharp to solve it.

我有一个要检索的网址HTML的来源。问题在于该特定网址实际上并未在GET上分发页面。相反,它将大量Javascript推送到浏览器,然后浏览器执行并生成实际的呈现页面。这意味着涉及 HttpWebRequest HttpWebResponse 的常规方法将行不通。

I have an url that I want to retrieve the HTML from. The problem is this particular url doesn't actually distribute the page on a GET. Instead, it pushes a mound of Javascript to the browser, which then executes and produces the actual rendered page. This means that the usual approaches involving HttpWebRequest and HttpWebResponse aren't going to work.

我看过许多不同的无头选项,而我认为最能满足我的需求的是CefSharp.Offscreen。但是我对这件事的运作方式一无所知。我看到有几个事件可以订阅,还有一些配置选项,但是我不需要嵌入式浏览器之类的东西。

I've looked at a number of different "headless" options, and the one that I think best meets my needs for a number of reasons is CefSharp.Offscreen. But I'm at a loss as to how this thing works. I see that there are several events that can be subscribed to, and some configuration options, but I don't need anything like an embedded browser.

我真正需要的是一种执行以下操作的方式(伪代码):

All I really need is a way to do something like this (pseudocode):

string html = CefSharp.Get(url);

预订事件没有问题,如果那是等待Javascript生成的必要条件执行并生成呈现的页面。

I don't have a problem subscribing to events, if that's what's needed to wait for the Javascript to execute and produce the rendered page.

推荐答案

如果无法获得无头版的Chromium来帮助您,则可以尝试node.js和 jsdom 。节点启动并运行后,易于安装和使用。您可以在Github README上看到一些简单的示例,这些示例会提取URL,运行所有javascript,包括任何自定义javascript代码(示例:用于计算某种类型元素的jQuery位),然后您可以在内存中使用HTML来执行所需的操作。您只需执行$('body')。html()即可获取一个字符串,就像在您的伪代码中一样。 (这甚至适用于生成SVG图形之类的东西,因为它只是更多的XML树节点。)

If you can't get a headless version of Chromium to help you, you could try node.js and jsdom. Easy to install and play with once you have node up and running. You can see simple examples on Github README where they pull down a URL, run all javascript, including any custom javascript code (example: jQuery bits to count some type of elements), and then you have the HTML in memory to do what you want. You can just do $('body').html() and get a string, like in your pseudo code. (This even works for stuff like generating SVG graphics since that is just more XML tree nodes.)

如果您需要将它作为大型C#应用程序的一部分,则需要分发,您使用CefSharp的想法。屏幕外听起来很合理。一种方法可能是先使事情与CefSharp.WinForms或CefSharp.WPF一起使用,在这里您可以从字面上看到事物,然后在一切正常后再尝试CefSharp.Offscreen。您甚至可以在屏幕上的浏览器中运行一些JavaScript,以拉下body.innerHTML并将其作为字符串返回到事物的C#端,然后再变得无脑。如果可行,其余的操作应该很容易。

If you need this as part of a larger C# app that you need to distribute, your idea to use CefSharp.Offscreen sounds reasonable. One approach might be to get things working with CefSharp.WinForms or CefSharp.WPF first, where you can literally see things, then try CefSharp.Offscreen later when this all works. You can even get some JavaScript running in the on-screen browser to pull down body.innerHTML and return it as a string to the C# side of things before you go headless. If that works, the rest should be easy.

也许以 CefSharp.MinimalExample 并进行编译,然后根据需要进行调整。您需要能够在C#代码中设置webBrowser.Address,并且需要知道页面何时加载,然后需要使用JavaScript代码(如一个字符串),将执行所描述的操作(将bodyElement.innerHTML作为字符串返回)。

Perhaps start with CefSharp.MinimalExample and get that compiling, then tweak it for your needs. You need to be able to set webBrowser.Address in your C# code, and you need to know when the page has Loaded, then you need to call webBrowser.EvaluateScriptAsync(".. JS code ..") with your JavaScript code (as a string) which will do something as described (returning bodyElement.innerHTML as a string).

这篇关于使用CefSharp.Offscreen检索需要Java渲染的网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆