如何在Apache POI Word中添加与文本内联的多个方程式? [英] How to add multiple equations inline with text in Apache POI Word?
问题描述
我正在使用Apache POI将带有乳胶样式方程式的文本转换为MS Word文档.在一些帮助下,我能够成功实现它,但是如果该行中有多个方程式,那么它将产生错误的结果.
I am converting text with latex style equation into MS word document using Apache POI. with some help, I was able to implement it successfully but if the line has more than one equation then it produces an incorrect result.
以下是我的代码:
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMath;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMathPara;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult;
import uk.ac.ed.ph.snuggletex.SnuggleInput;
import uk.ac.ed.ph.snuggletex.SnuggleEngine;
import uk.ac.ed.ph.snuggletex.SnuggleSession;
import java.io.IOException;
public class CreateWordFormulaFromMathML {
static File stylesheet = new File("MML2OMML.XSL");
static TransformerFactory tFactory = TransformerFactory.newInstance();
static StreamSource stylesource = new StreamSource(stylesheet);
static CTOMath getOMML(String mathML) throws Exception {
Transformer transformer = tFactory.newTransformer(stylesource);
StringReader stringreader = new StringReader(mathML);
StreamSource source = new StreamSource(stringreader);
StringWriter stringwriter = new StringWriter();
StreamResult result = new StreamResult(stringwriter);
transformer.transform(source, result);
String ooML = stringwriter.toString();
stringwriter.close();
CTOMath ctOMath = CTOMath.Factory.parse(ooML);
return ctOMath.getOMathArray(0);
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument();
String mstr = "The expression is as: $ax^2 + bx = c$ is easier to understand than $$ax^2 + \\frac{\\sin^{-1}\\theta}{\\cot{-1}} \\times y_1$$ or anything in \\[ ay^2 + b_2 \\theta\\]";
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
// run.setText("");
SnuggleEngine engine = new SnuggleEngine();
SnuggleSession session = engine.createSession();
SnuggleInput input = new SnuggleInput(mstr);
session.parseInput(input);
String mathML = session.buildXMLString();
System.out.println("Input " + input.getString() + " was converted to:\n" + mathML + "\n\n");
for(String s : mathML.split("\\s+(?=<math)|(?<=</math>)\\s+")){
if (s.startsWith("<math"))
{
CTOMath ctOMath = getOMML(s);
System.out.println(s);
CTP ctp = paragraph.getCTP();
ctp.setOMathArray(new CTOMath[]{ctOMath});
}
else
{
run.setText(s + " ");
System.out.println(s);
}
}
document.write(new FileOutputStream("CreateWordFormulaFromMathML.docx"));
document.close();
}
}
这将生成带有
表达式为:比y ^ 2 + b_2 \ theta或任何内容更易于理解
The expression is as: is easier to understand than or anything in ay^2+b_2 \theta
注意:(ay ^ 2 + b_2 \ theta)正确地表示为单词方程式.
Note : (ay^2+b_2 \theta) is correctly in word equation format.
我需要的是在中间带有多义等式的内联文本.
What I need is inline text with multipal equations in the middle.
推荐答案
如何解决创建诸如*.docx
之类的Office OpenXML
文件的任务?
How to approach solving tasks for creating Office OpenXML
files such as *.docx
?
Office OpenXML
文件(例如*.docx
)是ZIP
存档.我们可以将它们解压缩并查看内部结构.在*.docx
中,我们找到/word/document.xml
,然后在其中找到XML
,它描述了文档结构.对于包含公式内联的段落,我们会发现类似
Office OpenXML
files such as *.docx
are siply ZIP
archives. We can unzip them and have a look into the internals. In *.docx
we find /word/document.xml
and there we find XML
which describes the document structure. For paragraphs having formula inline we find something like:
<w:p>
<w:r>
<w:t>text</w:t>
</w:r>
<m:oMath>... </m:oMath>
<w:r>
<w:t>text</w:t>
</w:r>
<m:oMath>... </m:oMath>
...
</w:p>
因此,我们需要多次运行以保存文本,并且在它们之间有多个<m:oMath>... </m:oMath>
.
So we need multiple runs holding the text and between them multiple <m:oMath>... </m:oMath>
.
这就是为什么该段落具有OMathArray
CTOMath[]
的原因.并且您的代码确实用一个新数组覆盖了此数组,该数组在每次发现另外一个CTOMath
时分别一个 CTOMath
每个.每次找到另一个CTOMath
时,都需要向该数组中添加另一个CTOMath
.
Thats why the paragraph has a OMathArray
CTOMath[]
. And your code does overwriting this array with a new array having one CTOMath
each time a additional CTOMath
was found. Instead an additional CTOMath
needs to be added to the array, each time an additional CTOMath
was found.
要知道我们可以使用org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP
段落做什么,我们需要一个文档.我发现的最好的是 grepcode.com .在这里我们找到 CTP.setOMathArray(int,CTOMath).
To know what we can do with org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP
paragraphs, we need a documentation for this. Best I have found is grepcode.com. There we find CTP.addNewOMath() and CTP.setOMathArray(int, CTOMath).
因此,请更改代码,例如:
So changing your code like:
for(String s : mathML.split("\\s+(?=<math)|(?<=</math>)\\s+")){
if (s.startsWith("<math")) {
CTOMath ctOMath = getOMML(s);
System.out.println(s);
CTP ctp = paragraph.getCTP();
ctp.addNewOMath();
ctp.setOMathArray(ctp.sizeOfOMathArray()-1, ctOMath);
}
else {
run = paragraph.createRun();
run.setText(s + " ");
System.out.println(s);
}
}
应该工作.
这篇关于如何在Apache POI Word中添加与文本内联的多个方程式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!