解析来自OLE二进制字符串MathType的MTEF数据 [英] Parse MathType MTEF data from OLE binary string

查看:3010
本文介绍了解析来自OLE二进制字符串MathType的MTEF数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有必要的MathType的方程转换成2003在MS-WORD或低于成MathML为了很好地呈现在网络上。该MathType中内置的功能发布到MathPage可以做的工作非常漂亮,但我想公式转换过程在我的C#应用​​程序集成。因为我无法找到该MathPage导出接口由MathType的SDK提供的API参考,我需要想出一个办法自己做个体式的转换。



当前程序是在MS-Word 2003或以下的文件转换成打开XML格式(DOCX)。该DOCX转换后,我可以看到MathType中嵌入的OLE对象的二进制字符串保存在打开XML,这是DOCX。那么下一步是从嵌入的对象二进制字符串的中期支出框架数据进行解码,所以我试图参照上的MathType MTEF头的官方文档中提取的中期支出框架。



借助的base64二进制字符串,表示由MathType的创建嵌入对象,从中提取< A HREF =https://dl.dropbox.com/u/4625393/t1.docx相对=nofollow> MS-WORD测试DOCX文件。



的MTEF头定义:




MTEF数据被保存作为对象的本机的数据格式。每当一个等式对象是要被写入到一个OLE流,一个28-字节的标题被写入,随后MTEF数据。此头的C结构如下:




 结构EQNOLEFILEHDR {
WORD cbHdr; //长头的,的sizeof(EQNOLEFILEHDR)= 28字节
DWORD版本; // HIWORD = 2,LOWORD = 0
WORD比照; //剪贴板格式(MathType的EF)
DWORD cbObject; //下面这个头字节
DWORD保留1 MTEF数据的长度; //不使用
DWORD RESERVED2; //不使用
DWORD reserved3; //不使用
DWORD reserved4; //不使用
};




对CF成员是对Windows API的调用的返回值。函数的RegisterClipboardFormat(MathType的EF)




然后我试图将其转换为C#版本:

  [StructLayout(LayoutKind.Sequential,包= 1)] 
结构EQNOLEFILEHDR
{
公共UINT16 cbHdr;
公共UInt32的版本;
公共UINT16格式;
公共UInt32的大小;
公共UInt32的保留1;
公共UInt32的RESERVED2;
公共UInt32的reserved3;
公共UInt32的reserved4;
}



随着头结构准备好了,下面的代码试图在填写信息头结构从嵌入的对象二进制字符串。

 的foreach(EmbeddedObjectPart在wordDoc.MainDocumentPart.EmbeddedObjectParts EOP)
$ { b $流b流= eop.GetStream();
字节[]缓冲区=新的字节[int.Parse(stream.Length.ToString());使用
(BinaryReader读者=新BinaryReader(流))
{
INT解析度= reader.Read(缓冲液,0,int.Parse(stream.Length.ToString()));
}
的GCHandle HDL = GCHandle.Alloc(缓冲,GCHandleType.Pinned);
IntPtr的INTP = Marshal.AllocHGlobal(buffer.Length);
Marshal.Copy(缓冲液,0,INTP,Marshal.SizeOf(typeof运算(EQNOLEFILEHDR)));
EQNOLEFILEHDR头=(EQNOLEFILEHDR)Marshal.PtrToStructure(INTP的typeof(EQNOLEFILEHDR));
Marshal.FreeHGlobal(INTP);
}



但是,填充在首标结构中的数据是不正确的,让我认为这是不解析从DOCX文件中嵌入的对象二进制字符串的中期支出框架数据的正确方法。



我也看着样本.NET代码MathType中的SDK下载,并找到IDataObject的是用来包含MathType的信息和转换程序。所以另一种方法是使用的BinaryFormatter 来看看它是否可以反序列化二进制串到IDataObject的类型的对象,通过使用代码 BinaryFormatter.Deserialize (流)。但它也不行,提示异常二进制流0不包含一个有效的BinaryHeader



什么错在我试图用解析中期支出框架数据的方法?


解决方案

卡塔,你应该已经收到我的电子邮件回复,但对于其他人有兴趣,我们这是从我们的SDK修改的样本,我们很乐意发送给任何人谁需要它。对于使用它的人,它可能不会太大意义,如果你还没有下载SDK。请让我知道,如果你想给它一个尝试。



鲍勃·马修斯结果
设计科学


There is a need to convert the MathType equations in the MS-WORD 2003 or below to MathML in order to render nicely on the the web. The MathType's built in function "Publish to MathPage" can do the job very nicely, but I want to integrate the equation conversion process in my C# application. Because I couldn't find any API references that the MathPage export interface is provided by the MathType SDK, I need to figure out a way to do the individual equation conversion by myself.

The current procedure is to convert the MS-WORD 2003 or below documents into the Open XML format(docx). After the docx conversion, I can see the MathType embedded ole object binary string is saved in the open xml, which is the docx. Then the next step is to decode the MTEF data from the embedded object binary string, so I tried to extract the MTEF by referring to the official documentation on the MathType MTEF header.

The base64 binary string, representing embedded object created by MathType, is extracted from MS-WORD Test DOCX file.

The MTEF header definition:

MTEF data is saved as the native data format of the object. Whenever an equation object is to be written to an OLE "stream", a 28- byte header is written, followed by the MTEF data. The C struct for this header is as follows:

struct EQNOLEFILEHDR {
    WORD    cbHdr;     // length of header, sizeof(EQNOLEFILEHDR) = 28 bytes
    DWORD   version;   // hiword = 2, loword = 0
    WORD    cf;        // clipboard format ("MathType EF")
    DWORD   cbObject;  // length of MTEF data following this header in bytes
    DWORD   reserved1; // not used
    DWORD   reserved2; // not used
    DWORD   reserved3; // not used
    DWORD   reserved4; // not used
};

The cf member is the return value of a call to the Windows API function RegisterClipboardFormat("MathType EF").

Then I tried to convert it to the C# version:

[StructLayout(LayoutKind.Sequential, Pack=1)]
struct EQNOLEFILEHDR
{
    public UInt16 cbHdr;
    public UInt32 version;
    public UInt16 format;
    public UInt32 size;
    public UInt32 reserved1;
    public UInt32 reserved2;
    public UInt32 reserved3;
    public UInt32 reserved4;
}

With the header struct ready, the following code is trying to fill information in the header struct from the embedded object binary string.

foreach (EmbeddedObjectPart eop in wordDoc.MainDocumentPart.EmbeddedObjectParts)
{
    Stream stream = eop.GetStream();
    byte[] buffer = new byte[int.Parse(stream.Length.ToString())];
    using (BinaryReader reader = new BinaryReader(stream))
    {
        int res = reader.Read(buffer, 0, int.Parse(stream.Length.ToString()));
    }
    GCHandle hdl = GCHandle.Alloc(buffer, GCHandleType.Pinned);
    IntPtr intp = Marshal.AllocHGlobal(buffer.Length);
    Marshal.Copy(buffer, 0, intp, Marshal.SizeOf(typeof(EQNOLEFILEHDR)));
    EQNOLEFILEHDR header = (EQNOLEFILEHDR)Marshal.PtrToStructure(intp, typeof(EQNOLEFILEHDR));
    Marshal.FreeHGlobal(intp);
}

However, the data filled in the header struct isn't correct, making me to think this is not the right approach to parse the MTEF data from the embedded object binary string in the DOCX file.

I have also looked at the sample .NET code in the MathType SDK download, and find the IDataObject is used to contain the MathType information and conversion procedures. So the another approach is to use the BinaryFormatter to see if it can deserialize the binary string to a IDataObject type object, by using the code BinaryFormatter.Deserialize(stream). But it doesn't work either, prompting the exception Binary stream '0' does not contain a valid BinaryHeader

Anything wrong on the methods I tried to use to parse the MTEF data?

解决方案

Kata, you should have received my email reply, but for anyone else interested, we have a sample which is modified from our SDK that we'd be happy to send to anyone who needs it. For anyone using it, it probably won't make much sense if you haven't downloaded the SDK. Please let me know if you'd like to give it a try.

Bob Mathews
Design Science

这篇关于解析来自OLE二进制字符串MathType的MTEF数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆