WebBrowser和Returing原始HTML [英] WebBrowser and Returing the raw HTML

查看:106
本文介绍了WebBrowser和Returing原始HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的我正在制作一个相当简单的应用程序。它包含2个

的网络浏览器,使用最上面的一个,这样你就可以查看一个

的网站(即你创建的网站)。每次加载

a页面时,收到的HTML都会被发送到
http://validator.w3.org 网站验证您的HTML /

XHTML。


到目前为止我''我已经完成了所有工作,甚至是将HTML发布到w3.org网站的部分。


但以下所有命令(Browser1是主要的

WebBrowser控件)为

文档生成一种HTML形式,但所有标签都转换为大写和

部分文档丢失,例如DOCTYPE...


Browser1.Document.ToString()

Browser1.Document.documentelement.outerhtml

Browser1.Document.documentelement.innerhtml

Browser1.Document.Body.outerhtml

Browser1.Document.Body.innerhtml

Browser1.Document.All(0).outerhtml

Browser1.Document.All(0).innerhtml

Browser1.Document.All(1).outerhtml

Browser1.Document.All(1).innerhtml

Browser1.Document.All(2).outerhtml

Browser1.Document.All(2).innerhtml

注意:发送到w3.org网站的HTML必须与服务器的内容完全一致

否则,验证它是什么'

点?


最后,因为它将在互动网站上使用

(用户登录),你不能使用诸如

Inet之类的控件来返回HTML,然后用户(主浏览器)

将向服务器发出请求(这可能会删除

记录)然后Inet或Winsock(等)将发出

请求,但这会返回一个不同的页面

(说你不能删除记录)。

解决方案

嗨克雷格


因为DOCTYPE标签在主文档之外,当您检索内部和外部HTML时,它不包括在内。

。要包含整个文件,您需要使用IPersistStreamInit界面,例如


< interface>

Imports System。 Runtime.InteropServices


''IPersistStreamInit界面

< ComVisible(True),ComImport(),

Guid(" 7FD52380-4E07-101B-AE2D-08002B2EC713),_

InterfaceTypeAttribute(ComInterfaceType.InterfaceI sIUnknown)> _

Public Interface IPersistStreamInit

Sub GetClassID(ByRef pClassID As Guid)


< PreserveSig()>函数IsDirty()作为整数

< PreserveSig()>函数加载(ByVal pstm为UCOMIStream)为整数

< PreserveSig()>函数保存(ByVal pstm作为UCOMIStream,ByVal fClearDirty

作为布尔值)作为整数

< PreserveSig()>函数GetSizeMax(< InAttribute(),Out(),

MarshalAs(UnmanagedType.U8)> ByRef pcbSize As Long)作为整数

< PreserveSig()> ;函数InitNew()作为整数

结束界面

< / interface>


< code>

Dim ips as IPersistStreamInit


ips = DirectCast(Browser1.document,IPersistStreamInit)

ips.Save(strm,False)

< / code>


这会将完整的HTML保存到流中,您可以将其转换为

字符串。


关于转换为大写,这实​​际上是一个问题吗?

更改案例不应影响解析的有效性。


还有两个特殊的新闻组可以提供进一步的帮助:


microsoft.public.inetsdk.programming.mshtml_hostin g

microsoft.public.inetsdk.programming.webbrowser_ct l


HTH

Charles


" Craig Francis" < 1@1.com>在消息中写道

news:08 **************************** @ phx.gbl ... < blockquote class =post_quotes>好的,我正在制作一个相当简单的应用程序。它包含2个网页浏览器,使用最顶层的浏览器,以便您可以查看
网站(即您创建的网站)。每次加载页面时,收到的HTML都会被发送到
http://validator.w3.org 网站验证您的HTML /
XHTML。

到目前为止,我已经完成了所有工作,甚至是< HTML将发布到w3.org网站。

但以下所有命令(Browser1是主要的WebBrowser控件)都为
文档,但是所有标签都被转换为大写并且文档的某些部分丢失了,例如DOCTYPE......

Browser1.Document.ToString()
Browser1.Document.documentelement.outerhtml
Browser1.Document.documentelement.innerhtml
Browser1.Document.Body.outerhtml
Browser1.Document.Body.innerhtml
Browser1.Document.All (0).outerhtml
Browser1.Document.All(0).innerhtml
Browser1.Document.All(1).outerhtml
Browser1.Docume nt.All(1).innerhtml
Browser1.Document.All(2).outerhtml
Browser1.Document.All(2).innerhtml

注意:发送到的HTML w3.org网站必须与服务器发送的内容完全相同,否则验证它的重点是什么?

最后,因为它将被用于互动网站
(用户登录),你不能使用诸如Inet之类的控件来返回HTML,然后用户(主浏览器)将向服务器发出请求(其中可能会删除一个
记录)然后Inet或Winsock(等)会发出请求,但这会返回一个不同的页面
(说你不能删除记录)。


感谢您的快速回复。


但这是VB代码吗?我一直在使用VB5 / 6几年,而且看起来有些像C一样 - 这个项目是用VB.NET写的
,但我已经只是刚刚升级了并且

发现这些新方法有点奇怪。


此外还有RE标签被改为大写 - 原因

我提到它是因为它显示HTML

文档正在被更改,可能是一个

浏览器可以轻松理解的形式(并且可能是严格的XML)

即使输入不是基于XML的。)


无论如何,谢谢你给我一些其他尝试。

Craig


知道了,你把所有的


< interface>< / interface>


在Public Class Form1之前位 - 所以第一部分

表格,然后


< code>< / code>


返回HTML代码的函数。好吧,

方法不会带来任何错误,除了strm

应该被视为 - 我以前从未使用过流。


但是再次感谢 - 这是我过去两天在

中取得的最大进步!


Ok I''m making a fairly simple application. It contains 2
web browsers, the top one is used so that you can view a
website (i.e. one you have created). Every time you load
a page, the HTML which was received is then sent to the
http://validator.w3.org website to validate your HTML /
XHTML.

So far I''ve got everything to work, even the part where
the HTML is posted to the w3.org website.

But all of the following commands (Browser1 is the main
WebBrowser control) produce a form of HTML for the
document, but all the tags get converted to uppercase and
parts of the document go missing such as the "DOCTYPE"...

Browser1.Document.ToString()
Browser1.Document.documentelement.outerhtml
Browser1.Document.documentelement.innerhtml
Browser1.Document.Body.outerhtml
Browser1.Document.Body.innerhtml
Browser1.Document.All(0).outerhtml
Browser1.Document.All(0).innerhtml
Browser1.Document.All(1).outerhtml
Browser1.Document.All(1).innerhtml
Browser1.Document.All(2).outerhtml
Browser1.Document.All(2).innerhtml
NOTE: The HTML sent to the w3.org website must be exactly
the same as what the server sends otherwise what''s the
point in validating it?

Finally, because it will be used on interactive websites
(with a user login), you cant use controls such as the
Inet to return the HTML as then the user (main browser)
will make a request to the server (which may delete a
record) then the Inet or Winsock (etc) will make a
request, but this will then return a different page
(saying you cant delete a record).

解决方案

Hi Craig

Because the DOCTYPE tag is outside the main document, it is not included
when you retrieve inner and outer HTML. To include the entire file you will
need to use the IPersistStreamInit interface, e.g.

<interface>
Imports System.Runtime.InteropServices

'' IPersistStreamInit interface
<ComVisible(True), ComImport(),
Guid("7FD52380-4E07-101B-AE2D-08002B2EC713"), _
InterfaceTypeAttribute(ComInterfaceType.InterfaceI sIUnknown)> _
Public Interface IPersistStreamInit
Sub GetClassID(ByRef pClassID As Guid)

<PreserveSig()> Function IsDirty() As Integer
<PreserveSig()> Function Load(ByVal pstm As UCOMIStream) As Integer
<PreserveSig()> Function Save(ByVal pstm As UCOMIStream, ByVal fClearDirty
As Boolean) As Integer
<PreserveSig()> Function GetSizeMax(<InAttribute(), Out(),
MarshalAs(UnmanagedType.U8)> ByRef pcbSize As Long) As Integer
<PreserveSig()> Function InitNew() As Integer
End Interface
</interface>

<code>
Dim ips as IPersistStreamInit

ips = DirectCast(Browser1.document, IPersistStreamInit)

ips.Save(strm, False)
</code>

This will save the complete HTML to a stream, which you can turn into a
string.

Regarding the conversion to uppercase, is this actually a problem? The
change of case should not affect the validity of the parsing.

There also two particular newsgroups which may give further help:

microsoft.public.inetsdk.programming.mshtml_hostin g
microsoft.public.inetsdk.programming.webbrowser_ct l

HTH

Charles

"Craig Francis" <1@1.com> wrote in message
news:08****************************@phx.gbl...

Ok I''m making a fairly simple application. It contains 2
web browsers, the top one is used so that you can view a
website (i.e. one you have created). Every time you load
a page, the HTML which was received is then sent to the
http://validator.w3.org website to validate your HTML /
XHTML.

So far I''ve got everything to work, even the part where
the HTML is posted to the w3.org website.

But all of the following commands (Browser1 is the main
WebBrowser control) produce a form of HTML for the
document, but all the tags get converted to uppercase and
parts of the document go missing such as the "DOCTYPE"...

Browser1.Document.ToString()
Browser1.Document.documentelement.outerhtml
Browser1.Document.documentelement.innerhtml
Browser1.Document.Body.outerhtml
Browser1.Document.Body.innerhtml
Browser1.Document.All(0).outerhtml
Browser1.Document.All(0).innerhtml
Browser1.Document.All(1).outerhtml
Browser1.Document.All(1).innerhtml
Browser1.Document.All(2).outerhtml
Browser1.Document.All(2).innerhtml
NOTE: The HTML sent to the w3.org website must be exactly
the same as what the server sends otherwise what''s the
point in validating it?

Finally, because it will be used on interactive websites
(with a user login), you cant use controls such as the
Inet to return the HTML as then the user (main browser)
will make a request to the server (which may delete a
record) then the Inet or Winsock (etc) will make a
request, but this will then return a different page
(saying you cant delete a record).



Thank you for your quick reply.

But is that VB code? I''ve been using VB5/6 for several
years and that looks slightly C like - this project is
being written in VB.NET, but I''ve only just upgraded and
finding some of these new methods a little strange.

Also RE the tags being changed to uppercase - The reason
I mentioned it was because it shows that the HTML
document is being changed, probably into a form that the
browser can easily understand (and is probably strict XML
even if the input wasn''t XML based).

Anyway, thanks for giving me something else to try.

Craig


Got it, you put all the

<interface></interface>

before the "Public Class Form1" bit - so the first part
of the form, then the

<code></code>

in the function which returns the HTML code. Well that
method doesn''t bring up any errors apart from what "strm"
should be dimed as - I''ve never used a stream before.

But thanks again - this is the most progress I''ve made in
the past 2 days!


这篇关于WebBrowser和Returing原始HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆