使用C#或vb.net得到最终生成的HTML源代码 [英] Get the final generated html source using c# or vb.net

查看:423
本文介绍了使用C#或vb.net得到最终生成的HTML源代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用VB.net或C#

,我怎么生成的HTML源代码?

要得到一个页面,我可以用下面这个,但这不会得到生成的源的HTML源代码,它将不包含任何被在浏览器中的JavaScript动态添加的HTML。我如何得到最终生成的HTML源代码?

感谢

 的WebRequest REQ = WebRequest.Create(http://www.asp.net);
WebResponse类解析度= req.GetResponse();
StreamReader的SR =新的StreamReader(res.GetResponseStream());
字符串的html = sr.ReadToEnd();

如果我尝试这下面然后返回了JavaScript的code注射文档

 公共类Form1的    昏暗WB作为web浏览器=无    私人小组Form1_Load的(发送者为对象,E作为EventArgs的)把手MyBase.Load        WB =新的web浏览器()
        Me.Controls.Add(WB)
        AddHandler的WB.DocumentCompleted,AddressOf WebBrowser1_DocumentCompleted
        WB.Navigate(mysite的/ Default.aspx的)    结束小组    私人小组WebBrowser1_DocumentCompleted(发送者为对象,E为WebBrowserDocumentCompletedEventArgs)
        昏暗的HTML code的String = WebBrowser1.Document.Body.OuterHtml()
        昏暗的译文]字符串= WB.DocumentText    结束小组
末级

返回的HTML

 <!DOCTYPE HTML>< HTML的xmlns =htt​​p://www.w3.org/1999/xhtml>
<头=服务器>
    <标题>< /标题>< /头>
<身体GT;
    <表ID =form1的=服务器>
    < D​​IV ID =center_text_panel>
    //测试文本这段文字应该是这里
    < / DIV>
    < /表及GT;
< /身体GT;
< / HTML>    <脚本类型=文/ JavaScript的>        的document.getElementById(center_text_panel)的innerText =测试文本。
    < / SCRIPT>


解决方案

您可以使用 WebKit.NET

看这里官方教程

这不仅可以抢源,还可以通过页面加载事件处理的JavaScript。

  webKitBrowser1.Navigate(MyURL)

然后,处理DocumentCompleted事件和:

 私人documentContent = webKitBrowser1.DocumentText

修改 - 这可能是更好的开源的WebKit选项:的 HTTP://$c$c.google.com/p/open-webkit-sharp/

using VB.net or c#, How do I get the generated HTML source?

To get the html source of a page I can use this below but this wont get the generated source, it won't contain any of the html that was added dynamically by the javascript in the browser. How do I get the the final generated HTML source?

thanks

WebRequest req = WebRequest.Create("http://www.asp.net"); 
WebResponse res = req.GetResponse(); 
StreamReader sr = new StreamReader(res.GetResponseStream()); 
string html = sr.ReadToEnd();

if I try this below then it returns the document with out the JavaScript code injected

Public Class Form1

    Dim WB As WebBrowser = Nothing

    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load

        WB = New WebBrowser()
        Me.Controls.Add(WB)
        AddHandler WB.DocumentCompleted, AddressOf WebBrowser1_DocumentCompleted


        WB.Navigate("mysite/Default.aspx")

    End Sub

    Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs)


        'Dim htmlcode As String = WebBrowser1.Document.Body.OuterHtml()
        Dim s As String = WB.DocumentText

    End Sub
End Class

HTML returned

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title></title>

</head>
<body>
    <form id="form1" runat="server">
    <div id="center_text_panel">
    //test text  this text should be here
    </div>
    </form>
</body>
</html>

    <script type="text/javascript">

        document.getElementById("center_text_panel").innerText = "test text";


    </script>

解决方案

You can use WebKit.NET

Look here for official tutorials

This can not only grab the source, but also process javascript through the pageload event.

webKitBrowser1.Navigate(MyURL)

Then, handle the DocumentCompleted event, and:

private documentContent = webKitBrowser1.DocumentText

Edit - This might be the better open source WebKit option: http://code.google.com/p/open-webkit-sharp/

这篇关于使用C#或vb.net得到最终生成的HTML源代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆