如何修复“缺少字符"?使用 Pandoc 和 LaTeX 从 docx 转换为 pdf 时出现警告? [英] How do I fix "missing character" warnings when converting from docx to pdf using Pandoc and LaTeX?

查看:41
本文介绍了如何修复“缺少字符"?使用 Pandoc 和 LaTeX 从 docx 转换为 pdf 时出现警告?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几千个高棉语 .docx 文件,并希望使用 潘多克.

I have several thousand Khmer-language .docx files and would like to convert them to .pdf format using Pandoc.

我使用 MacPorts 安装了 Pandoc.Pandoc 需要 LaTeX 进行 PDF 转换,所以我安装了 MacTeX.安装似乎很顺利,我已经能够毫无困难地将英文 .docx 文件转换为 .pdf.

I installed Pandoc using MacPorts. Pandoc requires LaTeX for PDF conversion, so I installed MacTeX. Installation appears to have gone properly, and I've been able to convert English-language .docx files into .pdf without difficulty.

当我尝试转换高棉语文件时(您可以在 https://briancroxall 找到一个示例.net/pandoc/transcription.docx) 转为 PDF,我使用以下命令:

When I try to convert a Khmer-language file (you can find an example at https://briancroxall.net/pandoc/transcription.docx) to PDF, I use the following command:

pandoc transcription.docx  -s -o transcript.pdf

我收到以下错误:

Error producing PDF.
! Package inputenc Error: Unicode character អ (U+17A2)
(inputenc)                not set up for use with LaTeX.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              

l.64 ...�នៅសម័យប៉ុល ពត។}

Try running pandoc with --pdf-engine=xelatex.

尝试 2

按照这个建议,我使用这个命令:

Attempt 2

Following this suggestion, I use this command:

pandoc --pdf-engine=xelatex transcription.docx  -s -o transcript.pdf

Pandoc 然后为文本中的每个高棉字符抛出错误消息:

Pandoc then throws an error message for every Khmer character in the text:

[WARNING] Missing character: There is no អ in font [lmroman10-bold]:mapping=tex-text;!
[WARNING] Missing character: There is no ្ in font [lmroman10-bold]:mapping=tex-text;!
[WARNING] Missing character: There is no ន in font [lmroman10-bold]:mapping=tex-text;!
...

通过此过程生成 PDF(请参阅 https://briancroxall.net/pandoc/transcript.pdf),但它基本上是空的.

A PDF is produced by this process (see https://briancroxall.net/pandoc/transcript.pdf), but it is largely empty.

据我所知,这表明高棉字符在我试图用来进行转换的 LaTeX 引擎中不可用.不管是不是这样,我怎样才能成功地管理这个文件转换?

As best as I can tell, this suggests that Khmer characters are not being available in the LaTeX engine that I'm trying to use to do the conversion. Whether or not that is so, how can I manage this file conversion successfully?

推荐答案

mb21 的评论帮助我解决了这个问题.由于我的系统安装了几种高棉字体,我必须设置 mainfont 以使用其中一种.

mb21's comment helped me figure this out. Since my system has a couple of Khmer fonts installed, I had to set mainfont to use one of them.

$ pandoc --pdf-engine=xelatex transcription.docx  
      -V 'mainfont:Khmer MN' -s -o transcription.pdf

这会生成带有高棉字符且没有错误消息的 PDF.

This produces a PDF with Khmer characters and no error messages.

PDF 确实 似乎存在一些问题,因为高棉语中的某些短语超出了页面的边缘.我认为这是由于 Word 能够处理的分段问题,但在转换为 PDF 时却搞砸了.

The PDF does seem to have some issues in that some phrases in Khmer run off the margin of the page. I think this is due to segmentation issues that Word is equipped to deal with but that get messed up in conversion to PDF.

这篇关于如何修复“缺少字符"?使用 Pandoc 和 LaTeX 从 docx 转换为 pdf 时出现警告?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆