pandoc 生成的 docx 遗漏了方程中的斜体变量 [英] pandoc-generated docx misses italic variables in equations

查看:18
本文介绍了pandoc 生成的 docx 遗漏了方程中的斜体变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下嵌入 LaTeX 方程的 Markdown 片段:

I have the following segment of Markdown with embedded LaTeX equations:

# Fisher's linear discriminant


ewcommand{cov}{mathrm{cov}}

ewcommand{A}{mathrm{A}}

enewcommand{B}{mathrm{B}}

enewcommand{T}{^	op}

The first method to find an optimal linear discriminant was proposed by Fisher
(1936), using the ratio of the between-class variance to the within-class variance
of the projected data, $d(vec x)$, as a criterion. Expressed in terms of the
sample properties, the $p$-dimensional centroids $ar {vec x}_A$ and
$ar {vec x}_B$ and the $p 	imes p$ covariance matrices
$S_A = cov_i ( vec x_{A i} )$ and $S_B = cov_i ( vec x_{B i} )$, the
optimal direction is given by 
$$
vec w = left ( frac{ S_A + S_B }{2} 
ight ) ^{-1}
~ ( ar {vec x}_B - ar {vec x}_A ).
$$

当我用 pandoc 将它转换为 LaTeX 并用 xelatex 编译它时,我得到了预期的文本,并带有很好的数学渲染.当我使用 pandoc 将其转换为 MS Word 时使用

When I convert it with pandoc to LaTeX and compile it with xelatex, I get the expected text with nicely rendered math. When I convert it with pandoc to MS Word using

pandoc test.text -o test.docx

并在 MS Office Word 2007 中打开它,我得到以下信息:

and open it in MS Office Word 2007, I get the following:

只有等式中符号或直立文本的那些部分才能正确呈现,而斜体的变量名称会被方框中的问号替换.

Only those parts of the equations that are symbols or upright text get rendered correctly, while variable names in italics are replaced by a question mark in a box.

我怎样才能做到这一点?

推荐答案

在 Word 2007 中,我看到的结果与您的类似,只是在这里,我没有看到框中的问号"字符,只有空格.

In Word 2007, I see a result similar to yours, except that here, I don't see the "question marks in boxes" characters, just space.

如果我然后使用其中一个表达式,并使用您的技巧进行线性显示并返回,则该表达式的字符会重新出现.

If I then take one of the expressions, and use your trick of going to linear display and back, the characters reappear for that expression.

如果我保存并重新打开,其他表达式仍然无法正确显示,但是如果我保存并查看 XML,我注意到

If I save and re-open, the other expressions still do not display correctly, but if I save and look at the XML, I notice that

  1. 数学字体已更改为 Cambria Math
  2. 附加运行参数 (w:rPr) XML 指定 Cambria MathoMath 中的许多运行 (w:r) 中都插入了字体元素,即使在不显示的 oMath 表达式中正确.但是,在现在显示的 oMath 表达式中正确地,这个额外的 XML 已应用于每次运行.在里面其他的,它只应用于一些运行(我想我可以看到模式,但我现在没时间了...)
  3. 如果我手动将 XML 添加到其他运行并重新打开文档,表达式显示正确.或者至少,他们在我试过的一个案例.

由于 Word 2010 可以正确显示结果,我只能假设它不依赖于这些显式字体设置,而 Word 2007 则可以.这还没有真正帮助你,因为改变所有那些 w:r 元素比你已经在做的更难.但是可能需要设置默认样式/字体,或者在 XML 层次结构中更高的某个位置,或者可能在 .zip 中的其他位置(可能在 fontTable.xml 或 style.xml 中).我对 Word 的 XML 结构不够熟悉,无法猜测是什么,如果可能缺少任何内容,但明天可以看看.

Since Word 2010 displays the resuls correctly, I can only assume that it does not rely on these explicit font settings, whereas Word 2007 does. This doesn't really help you yet, because altering all those w:r elements would be even harder than what you are already doing. But it is possible that a default style/font needs to be set, either somewhere higher in the XML hierarchy, or perhaps elsewhere in the .zip (perhaps in fontTable.xml or styles.xml). I'm not familiar enough with Word's XML structures to guess what, if anything might be missing, but may be able to have a look tomorrow.

我想另一种可能性是你只需要拥有所有这些额外的 rPr 元素才能在 Word 2007 中工作,这表明 pandoc 可能是为 Word 2010 而不是 2007 编写的.(我什么都不知道关于工具).

I suppose another possibility is that you just have to have all these extra rPr elements for this to work in Word 2007, which would suggest that pandoc may have been written for Word 2010, not 2007. (I don't know anything about the tool).

举个例子,你有

<m:r>
  <m:t>(</m:t>
</m:r>

你需要的是

<m:r>
  <w:rPr>
    <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math" />
  </w:rPr>
  <m:t>(</m:t>
</m:r>

这篇关于pandoc 生成的 docx 遗漏了方程中的斜体变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆