如何解析WordOpenXML输出中的mathML? [英] How to parse mathML in output of WordOpenXML?

查看:514
本文介绍了如何解析WordOpenXML输出中的mathML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只想读取用于生成方程式的xml,我使用Paragraph.Range.WordOpenXML获得了该xml.但是用于方程式的部分与MathML不同,正如我发现Microsoft的EquationMathML中一样.

I want to read only the xml used for generating equation, which i obtained by using Paragraph.Range.WordOpenXML. But the section used for the equation is not as per MathML which as i found that the Equation of microsoft is in MathML.

我需要使用一些特殊的转换器来获取所需的xml还是有其他方法?

Do I need to use some special converter to get desired xmls or are there any other methods?

推荐答案

您可以使用OMML2MML.XSL文件(位于%ProgramFiles%\Microsoft Office\Office15下) 将Word文档中包含的 Microsoft Office MathML (等式)转换为 MathML .

You could use the OMML2MML.XSL file (located under %ProgramFiles%\Microsoft Office\Office15) to transform Microsoft Office MathML (equations) included in a word document into MathML.

下面的代码显示了如何将Word文档中的方程式转换为MathML 使用以下步骤:

The code below shows how to transform the equations in a word document into MathML using the following steps:

  1. 使用OpenXML SDK(版本2.5)打开word文档.
  2. 创建一个XslCompiledTransform并加载OMML2MML.XSL文件.
  3. 通过调用Transform()方法来转换word文档 在创建的XslCompiledTransform实例上.
  4. 输出转换结果(例如,在控制台上打印或写入文件).
  1. Open the word document using OpenXML SDK (version 2.5).
  2. Create a XslCompiledTransform and load the OMML2MML.XSL file.
  3. Transform the word document by calling the Transform() method on the created XslCompiledTransform instance.
  4. Output the result of the transform (e.g. print on console or write to file).

我已经用一个简单的word文档测试了下面的代码,该文档包含两个方程,文本和图片.

I've tested the code below with a simple word document containing two equations, text and pictures.

using System.IO;
using System.Xml;
using System.Xml.Xsl;
using DocumentFormat.OpenXml.Packaging;

public string GetWordDocumentAsMathML(string docFilePath, string officeVersion = "14")
{
    string officeML = string.Empty;
    using (WordprocessingDocument doc = WordprocessingDocument.Open(docFilePath, false))
    {
        string wordDocXml = doc.MainDocumentPart.Document.OuterXml;

        XslCompiledTransform xslTransform = new XslCompiledTransform();

        // The OMML2MML.xsl file is located under 
        // %ProgramFiles%\Microsoft Office\Office15\
        xslTransform.Load(@"c:\Program Files\Microsoft Office\Office" + officeVersion + @"\OMML2MML.XSL");

        using (TextReader tr = new StringReader(wordDocXml))
        {
            // Load the xml of your main document part.
            using (XmlReader reader = XmlReader.Create(tr))
            {
                using (MemoryStream ms = new MemoryStream())
                {
                    XmlWriterSettings settings = xslTransform.OutputSettings.Clone();

                    // Configure xml writer to omit xml declaration.
                    settings.ConformanceLevel = ConformanceLevel.Fragment;
                    settings.OmitXmlDeclaration = true;

                    XmlWriter xw = XmlWriter.Create(ms, settings);

                    // Transform our OfficeMathML to MathML.
                    xslTransform.Transform(reader, xw);
                    ms.Seek(0, SeekOrigin.Begin);

                    using (StreamReader sr = new StreamReader(ms, Encoding.UTF8))
                    {
                        officeML = sr.ReadToEnd();
                        // Console.Out.WriteLine(officeML);
                    }
                }
            }
        }
    }
    return officeML;
}

要仅转换一个方程式(而不转换整个word文档),只需查询所需的 Office Math段落(m:oMathPara),然后使用此节点的OuterXML属性. 下面的代码显示了如何查询第一个数学段落:

To convert only one single equation (and not the whole word document) just query for the desired Office Math Paragraph (m:oMathPara) and use the OuterXML property of this node. The code below shows how to query for the first math paragraph:

string mathParagraphXml = 
      doc.MainDocumentPart.Document.Descendants<DocumentFormat.OpenXml.Math.Paragraph>().First().OuterXml;

使用返回的XML来填充TextReader.

Use the returned XML to feed the TextReader.

这篇关于如何解析WordOpenXML输出中的mathML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆