.NET文件写入与MSHTML [英] .net document write with mshtml

查看:247
本文介绍了.NET文件写入与MSHTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用MSHTML的HTML解析。 (版本7.0.3300.0,C:\Program Files\Microsoft.NET\Primary互操作Assemblies\Microsoft.mshtml.dll)。



HTMLDocumentClass有一个写方法,所以我用它,但它提出了与
收到COMException错误码:-2147352571和消息:类型不匹配。是什么原因呢?如果不会使用HTMLDocumentClass的写入方法为什么他们界定?



  HTMLDocumentClass getHTMLDocument(字符串HTML)
{
HTMLDocumentClass DOC =新HTMLDocumentClass();

doc.write(新的对象[] {HTML}); //产生异常
doc.close();

返回文档;
}

HTMLDocumentClass getHTMLDocument2(字符串HTML)
{
HTMLDocumentClass DOC =新HTMLDocumentClass();
的IHTMLDocument2 DOC2 =(的IHTMLDocument2)文档;
doc2.write(新的对象[] {HTML});
doc2.close();

返回文档;
}


解决方案

好吧,我发现它。这是一个有趣的故障模式。所有这一切我已经安装的机器上Microsoft.mshtml的初步评估已经过时。不超过他们的4,所有版本7.0.3300.0与1.0.3705运行时的目标(这是很老)少。



这是由该所产生的互操作fooClass类类型库导入是原因。它是一种合成类,它的存在使事件有点容易对付,他们在COM截然不同完成。类是所有的所有接口的组合方式的扁平版本。在HTMLDocument的组件类的当前SDK版本被声明为(从mshmtl.idl)如下:

  [
UUID( 25336920-03F9-11cf-8FD0-00AA00686F13)
]
组件类HTMLDocument的
{
[默认]调度接口DispHTMLDocument;
[来源,默认]调度接口HTMLDocumentEvents;
【来源】调度接口HTMLDocumentEvents2;
【来源】调度接口HTMLDocumentEvents3;
接口的IHTMLDocument2;
接口IHTMLDocument3;
接口IHTMLDocument4;
接口IHTMLDocument5;
接口IHTMLDocument6;
接口IHTMLDOMNode;
接口IHTMLDOMNode2;
接口IDocumentSelector;
接口IHTMLDOMConstructor;
};

如果您在互操作库使用对象浏览器,你会看到HTMLDocumentClass是的失踪的为IHTMLDocument6,IDocumentSelector和IHTMLDOMConstructor接口方法。您正在使用的write()方法是过去这些接口。



这意味着,如果你使用HTMLDocumentClass.write(),你会调用的错误的方法的。将引发异常,因为不论用何种方法被称为是不满意的说法。当然,事实并非如此。



这当然是一个讨厌的故障模式。这出现,因为微软爆出一个非常困难的COM的要求,改变一个COM接口或coclass需要的不同的的GUID。在[UUID]属性在上面的声明。然而,这也使得Internet Explorer中的新版本与使用它的旧代码完全不兼容。岩石和坚硬的地方,后向兼容性是微软非常神圣的。接口实现中的coclass顺序并不通常在定期COM客户端的一个问题。除了在.NET中,它打破了TLBIMP生成合成XxxClass类型的布局。



我从来没有见过的地方实际上是要求综合类,从不使用情况它自己。您可以随时在C#铸造获得正确的接口指针,即调用QueryInterface()且不论版本总是返回正确的指针。您的选择是正确的解决方法。


I am using mshtml for html parsing. (version 7.0.3300.0, C:\Program Files\Microsoft.NET\Primary Interop Assemblies\Microsoft.mshtml.dll).

HTMLDocumentClass have a write method so i used it but it raises ComException with ErrorCode:-2147352571 and Message:Type mismatch. What is the reason for it? If write method of HTMLDocumentClass will not be used why did they define?

    HTMLDocumentClass getHTMLDocument(string html)
    {
        HTMLDocumentClass doc = new HTMLDocumentClass();

        doc.write(new object[] { html }); // raises exception
        doc.close();

        return doc;
    }

    HTMLDocumentClass getHTMLDocument2(string html)
    {
        HTMLDocumentClass doc = new HTMLDocumentClass();
        IHTMLDocument2 doc2 = (IHTMLDocument2)doc;
        doc2.write(new object[] { html });
        doc2.close();

        return doc;
    }

解决方案

Okay, I found it. This is an interesting failure mode. All of the PIAs for Microsoft.mshtml that I have installed on machine are outdated. No less than 4 of them, all version 7.0.3300.0 with a runtime target of 1.0.3705 (which is quite old).

The fooClass interop class that's generated by the type library importer is the cause. It is a synthetic class, it exists to make events a bit easier to deal with, they are done very differently in COM. The class is a flattened version of all of the combined methods of all interfaces. The current SDK version of the HTMLDocument coclass is declared as follows (from mshmtl.idl):

[
    uuid(25336920-03F9-11cf-8FD0-00AA00686F13)
]
coclass HTMLDocument
{
    [default]           dispinterface DispHTMLDocument;
    [source, default]   dispinterface HTMLDocumentEvents;
    [source]            dispinterface HTMLDocumentEvents2;
    [source]            dispinterface HTMLDocumentEvents3;
                        interface IHTMLDocument2;
                        interface IHTMLDocument3;
                        interface IHTMLDocument4;
                        interface IHTMLDocument5;
                        interface IHTMLDocument6;
                        interface IHTMLDOMNode;
                        interface IHTMLDOMNode2;
                        interface IDocumentSelector;
                        interface IHTMLDOMConstructor;
};

If you use Object Browser on the interop library, you'll see that HTMLDocumentClass is missing the interface methods for IHTMLDocument6, IDocumentSelector and IHTMLDOMConstructor. The write() method you are using is past these interfaces.

Which means that if you use HTMLDocumentClass.write(), you'll call the wrong method. The exception is raised because whatever method is being called isn't happy about the argument. Of course it is not.

This is a nasty failure mode of course. This came about because Microsoft broke a very hard COM requirement, changing a COM interface or coclass requires a different guid. The [uuid] attribute in the above declaration. That however also makes new versions of Internet Explorer completely incompatible with old code that uses it. Rock and a hard place, backwards compatibility is quite sacred at Microsoft. The order of interface implementations in a coclass is not normally a problem in regular COM clients. Except in .NET, it breaks the layout of the synthetic XxxClass type that tlbimp generates.

I've never seen a case where that synthetic class was actually required and never use it myself. You can always obtain the correct interface pointer by casting in C#, that calls QueryInterface() and always returns the correct pointer regardless of the version. Your alternative is the proper workaround.

这篇关于.NET文件写入与MSHTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆