从CefSharp Web浏览器获取HTML源代码 [英] Get HTML source code from CefSharp web browser
问题描述
我正在使用aCefSharp.Wpf.ChromiumWebBrowser(版本47.0.3.0)加载网页。页面加载后,我想获取源代码。
I am using aCefSharp.Wpf.ChromiumWebBrowser (Version 47.0.3.0) to load a web page. Some point after the page has loaded I want to get the source code.
我打电话给了:
wb.GetBrowser().MainFrame.GetSourceAsync()
但是似乎并没有返回所有源代码(我相信这是因为存在子框架)。
however it does not appear to be returning all the source code (I believe this is because there are child frames).
如果我调用:
wb.GetBrowser().MainFrame.ViewSource()
我可以看到它列出了所有源代码(包括内部框架)。
I can see it lists all the source code (including the inner frames).
我希望得到与ViewSource()相同的结果。有人可以指出我的正确方向吗?
I would like to get the same result as ViewSource(). Could some one point me in the right direction please?
更新-添加了代码示例
注意:Web浏览器指向的地址也只能在2016年10月3日(含)使用。之后,它可能会显示不同的数据,而不是我要查看的数据。
在frmSelection.xaml文件中
In the frmSelection.xaml file
<cefSharp:ChromiumWebBrowser Name="wb" Grid.Column="1" Grid.Row="0" />
在frmSelection.xaml.cs文件中
In the frmSelection.xaml.cs file
public partial class frmSelection : UserControl
{
private System.Windows.Threading.DispatcherTimer wbTimer = new System.Windows.Threading.DispatcherTimer();
public frmSelection()
{
InitializeComponent();
// This timer will start when a web page has been loaded.
// It will wait 4 seconds and then call wbTimer_Tick which
// will then see if data can be extracted from the web page.
wbTimer.Interval = new TimeSpan(0, 0, 4);
wbTimer.Tick += new EventHandler(wbTimer_Tick);
wb.Address = "http://www.racingpost.com/horses2/cards/card.sd?race_id=644222&r_date=2016-03-10#raceTabs=sc_";
wb.FrameLoadEnd += new EventHandler<CefSharp.FrameLoadEndEventArgs>(wb_FrameLoadEnd);
}
void wb_FrameLoadEnd(object sender, CefSharp.FrameLoadEndEventArgs e)
{
if (wbTimer.IsEnabled)
wbTimer.Stop();
wbTimer.Start();
}
void wbTimer_Tick(object sender, EventArgs e)
{
wbTimer.Stop();
string html = GetHTMLFromWebBrowser();
}
private string GetHTMLFromWebBrowser()
{
// call the ViewSource method which will open up notepad and display the html.
// this is just so I can compare it to the html returned in GetSourceAsync()
// This is displaying all the html code (including child frames)
wb.GetBrowser().MainFrame.ViewSource();
// Get the html source code from the main Frame.
// This is displaying only code in the main frame and not any child frames of it.
Task<String> taskHtml = wb.GetBrowser().MainFrame.GetSourceAsync();
string response = taskHtml.Result;
return response;
}
}
推荐答案
我认为我不太了解这种 DispatcherTimer
解决方案。我会这样:
I don't think I quite get this DispatcherTimer
solution. I would do it like this:
public frmSelection()
{
InitializeComponent();
wb.FrameLoadEnd += WebBrowserFrameLoadEnded;
wb.Address = "http://www.racingpost.com/horses2/cards/card.sd?race_id=644222&r_date=2016-03-10#raceTabs=sc_";
}
private void WebBrowserFrameLoadEnded(object sender, FrameLoadEndEventArgs e)
{
if (e.Frame.IsMain)
{
wb.ViewSource();
wb.GetSourceAsync().ContinueWith(taskHtml =>
{
var html = taskHtml.Result;
});
}
}
我对<$ c的输出进行了比较$ c> ViewSource 和 html
变量中的文本是相同的,因此我在这里无法重现您的问题。
I did a diff on the output of ViewSource
and the text in the html
variable and they are the same, so I can't reproduce your problem here.
这是说,我注意到主机装入的时间很晚,因此您必须等待一段时间,直到记事本随源一起弹出。
This said, I noticed that the main frame gets loaded pretty late, so you have to wait quite a while until the notepad pops up with the source.
这篇关于从CefSharp Web浏览器获取HTML源代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!