如何解析WordOpenXML输出中的mathML? [英] How to parse mathML in output of WordOpenXML?
问题描述
我只想读取用于生成方程式的xml,我使用Paragraph.Range.WordOpenXML
获得了该xml.但是用于方程式的部分与MathML
不同,正如我发现Microsoft的Equation
在MathML
中一样.
I want to read only the xml used for generating equation, which i obtained by using Paragraph.Range.WordOpenXML
. But the section used for the equation is not as per MathML
which as i found that the Equation
of microsoft is in MathML
.
我需要使用一些特殊的转换器来获取所需的xml还是有其他方法?
Do I need to use some special converter to get desired xmls or are there any other methods?
推荐答案
您可以使用OMML2MML.XSL
文件(位于%ProgramFiles%\Microsoft Office\Office15
下)
将Word文档中包含的 Microsoft Office MathML (等式)转换为 MathML .
You could use the OMML2MML.XSL
file (located under %ProgramFiles%\Microsoft Office\Office15
)
to transform Microsoft Office MathML (equations) included in a word document into MathML.
下面的代码显示了如何将Word文档中的方程式转换为MathML 使用以下步骤:
The code below shows how to transform the equations in a word document into MathML using the following steps:
- 使用OpenXML SDK(版本2.5)打开word文档.
- 创建一个XslCompiledTransform并加载OMML2MML.XSL文件.
- 通过调用Transform()方法来转换word文档 在创建的XslCompiledTransform实例上.
- 输出转换结果(例如,在控制台上打印或写入文件).
- Open the word document using OpenXML SDK (version 2.5).
- Create a XslCompiledTransform and load the OMML2MML.XSL file.
- Transform the word document by calling the Transform() method on the created XslCompiledTransform instance.
- Output the result of the transform (e.g. print on console or write to file).
我已经用一个简单的word文档测试了下面的代码,该文档包含两个方程,文本和图片.
I've tested the code below with a simple word document containing two equations, text and pictures.
using System.IO;
using System.Xml;
using System.Xml.Xsl;
using DocumentFormat.OpenXml.Packaging;
public string GetWordDocumentAsMathML(string docFilePath, string officeVersion = "14")
{
string officeML = string.Empty;
using (WordprocessingDocument doc = WordprocessingDocument.Open(docFilePath, false))
{
string wordDocXml = doc.MainDocumentPart.Document.OuterXml;
XslCompiledTransform xslTransform = new XslCompiledTransform();
// The OMML2MML.xsl file is located under
// %ProgramFiles%\Microsoft Office\Office15\
xslTransform.Load(@"c:\Program Files\Microsoft Office\Office" + officeVersion + @"\OMML2MML.XSL");
using (TextReader tr = new StringReader(wordDocXml))
{
// Load the xml of your main document part.
using (XmlReader reader = XmlReader.Create(tr))
{
using (MemoryStream ms = new MemoryStream())
{
XmlWriterSettings settings = xslTransform.OutputSettings.Clone();
// Configure xml writer to omit xml declaration.
settings.ConformanceLevel = ConformanceLevel.Fragment;
settings.OmitXmlDeclaration = true;
XmlWriter xw = XmlWriter.Create(ms, settings);
// Transform our OfficeMathML to MathML.
xslTransform.Transform(reader, xw);
ms.Seek(0, SeekOrigin.Begin);
using (StreamReader sr = new StreamReader(ms, Encoding.UTF8))
{
officeML = sr.ReadToEnd();
// Console.Out.WriteLine(officeML);
}
}
}
}
}
return officeML;
}
要仅转换一个方程式(而不转换整个word文档),只需查询所需的 Office Math段落(m:oMathPara),然后使用此节点的OuterXML
属性.
下面的代码显示了如何查询第一个数学段落:
To convert only one single equation (and not the whole word document) just query for the desired Office Math Paragraph (m:oMathPara) and use the OuterXML
property of this node.
The code below shows how to query for the first math paragraph:
string mathParagraphXml =
doc.MainDocumentPart.Document.Descendants<DocumentFormat.OpenXml.Math.Paragraph>().First().OuterXml;
使用返回的XML来填充TextReader
.
Use the returned XML to feed the TextReader
.
这篇关于如何解析WordOpenXML输出中的mathML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!