从HttpWebRequest获取HTMLDocument而不需要HtmlAgilityPack [英] get HTMLDocument from HttpWebRequest without HtmlAgilityPack

查看:146
本文介绍了从HttpWebRequest获取HTMLDocument而不需要HtmlAgilityPack的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试编写一个函数,它使用HttpWebRequest而不是浏览器返回htmlDocument,但我坚持传递innerhtml。



我不明白如何设置mWebPage的值,因为VB不接受HTMLDocument的New。我知道我可以使用HtmlAgilityPack,但是我可以使用HtmlAgilityPack想要测试我当前的代码,只改变web请求,而不是改变所有的解析代码(为此我需要一个HtmlDocument)



经过这个测试,

 函数mWebRe(ByVal mUrl As String)作为HTMLDocument 
Dim请求As HttpWebRequest = CType(WebRequest.Create(mUrl),HttpWebRequest)

'对请求使用的资源设置一些合理限制
request.MaximumAutomaticRedirections = 4
request.MaximumResponseHeadersLength = 4

'设置凭据以用于此请求。
request.Credentials = CredentialCache.DefaultCredentials

'在这里我尝试了很多类型
Dim mWebPage作为HTMLDocument
尝试
Dim request2 As HttpWebRequest = WebRequest .Create(mUrl)
Dim response2 As HttpWebResponse = request2.GetResponse()
Dim reader2 As StreamReader = New StreamReader(response2.GetResponseStream())
Dim WebContent As String = reader2.ReadToEnd( )

'这是我的最后一次尝试
'这给出了空引用异常
mWebPage.Body.InnerHtml = WebContent


Catch ex作为例外
MsgBox(ex.ToString)
结束尝试

返回mWebPage
结束函数

我已经尝试了很多方法(也导入HTML对象库),但没有任何工作:($ / b>

解决方案

我在网上找到了一个解决方案,并修改了我的代码,如下所示:
为了使它工作,你m ust激活对Microsoft HTML对象库的引用(在.Com引用中)

它已过时,但它似乎是不使用webbrowser而生成html文档的唯一方法。

我希望它可以帮助别人。

 函数mWebRe( ByVal mUrl As String)As MSHTML.HTMLDocument 
Dim request As HttpWebRequest = WebRequest.Create(mUrl)
Dim doc As MSHTML.IHTMLDocument2 = New MSHTML.HTMLDocument

'Set some此请求使用的资源的合理限制
request.MaximumAutomaticRedirections = 4
request.MaximumResponseHeadersLength = 4

'设置用于此请求的凭证。
request.Credentials = CredentialCache.DefaultCredentials

Try
Dim response As HttpWebResponse = request.GetResponse()
Dim reader As StreamReader = New StreamReader(response.GetResponseStream( ))
Dim WebContent As String = reader.ReadToEnd()

doc.clear()
doc.write(WebContent)
doc.close()

'确保数据完全加载。
while(doc.readyState<>)
'这更多的等待(如果需要的话)
'System.Threading.Thread.Sleep(1000)
应用程序.DoEvents()
End While
Catch ex例外
MsgBox(ex.ToString)
结束尝试

返回文档
结束函数


I'm trying to write a function that returns an "htmlDocument" using "HttpWebRequest" instead of a browser but I'm stuck with transferring of innerhtml.

I don't understand how to set value of "mWebPage" because VB doesn't accept "New" for HTMLDocument

I know that I can use "HtmlAgilityPack" but I would like to test my current code, changing only web request and not to change all parsing code.(To do this I need an HtmlDocument)

After this test, I'll try to change also the parsing code.

Function mWebRe(ByVal mUrl As String) As HTMLDocument
    Dim request As HttpWebRequest = CType(WebRequest.Create(mUrl), HttpWebRequest)

    ' Set some reasonable limits on resources used by this request
    request.MaximumAutomaticRedirections = 4
    request.MaximumResponseHeadersLength = 4

    ' Set credentials to use for this request.
    request.Credentials = CredentialCache.DefaultCredentials

    'Here I've tryed many types
    Dim mWebPage As HTMLDocument
    Try
        Dim request2 As HttpWebRequest = WebRequest.Create(mUrl)
        Dim response2 As HttpWebResponse = request2.GetResponse()
        Dim reader2 As StreamReader = New StreamReader(response2.GetResponseStream())
        Dim WebContent As String = reader2.ReadToEnd()

        'This is my last attempt
        'This gives Null Reference Exception
        mWebPage.Body.InnerHtml = WebContent


    Catch ex As Exception
        MsgBox(ex.ToString) 
    End Try

    Return mWebPage
End Function

I've tryed many ways (also import HTML Object Library) but nothing worked :(

解决方案

I found a solution on the web and modified my code as below: To make it work you must activate reference to "Microsoft HTML object library" (in .Com references)

It is obsolete but it seems to be the only way to make an html document without using webbrowser.

I Hope it helps someone else.

Function mWebRe(ByVal mUrl As String) As MSHTML.HTMLDocument
    Dim request As HttpWebRequest = WebRequest.Create(mUrl)
    Dim doc As MSHTML.IHTMLDocument2 = New MSHTML.HTMLDocument

    ' Set some reasonable limits on resources used by this request
    request.MaximumAutomaticRedirections = 4
    request.MaximumResponseHeadersLength = 4

    ' Set credentials to use for this request.
    request.Credentials = CredentialCache.DefaultCredentials

    Try
        Dim response As HttpWebResponse = request.GetResponse()
        Dim reader As StreamReader = New StreamReader(response.GetResponseStream())
        Dim WebContent As String = reader.ReadToEnd()

        doc.clear()
        doc.write(WebContent)
        doc.close()

        'To make sure that the data is fully load.
        While (doc.readyState <> "complete")
            'This for more waiting (if needed)
            'System.Threading.Thread.Sleep(1000)
            Application.DoEvents()
        End While
    Catch ex As Exception
        MsgBox(ex.ToString)
    End Try

    Return doc
End Function

这篇关于从HttpWebRequest获取HTMLDocument而不需要HtmlAgilityPack的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆