如何解决“缺少字符"问题使用Pandoc和LaTeX从docx转换为pdf时出现警告? [英] How do I fix "missing character" warnings when converting from docx to pdf using Pandoc and LaTeX?

查看:86
本文介绍了如何解决“缺少字符"问题使用Pandoc和LaTeX从docx转换为pdf时出现警告?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有数千个高棉语 .docx 文件,并希望使用

I have several thousand Khmer-language .docx files and would like to convert them to .pdf format using Pandoc.

我使用MacPorts安装了Pandoc.Pandoc需要LaTeX进行PDF转换,因此我安装了 MacTeX .安装似乎已正确进行,并且我能够轻松地将英语 .docx 文件转换为 .pdf .

I installed Pandoc using MacPorts. Pandoc requires LaTeX for PDF conversion, so I installed MacTeX. Installation appears to have gone properly, and I've been able to convert English-language .docx files into .pdf without difficulty.

当我尝试转换高棉语言文件时(您可以在 https://briancroxall中找到示例.net/pandoc/transcription.docx )转换为PDF,我使用以下命令:

When I try to convert a Khmer-language file (you can find an example at https://briancroxall.net/pandoc/transcription.docx) to PDF, I use the following command:

pandoc transcription.docx  -s -o transcript.pdf

我收到以下错误:

Error producing PDF.
! Package inputenc Error: Unicode character អ (U+17A2)
(inputenc)                not set up for use with LaTeX.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              

l.64 ...�នៅសម័យប៉ុល ពត។}

Try running pandoc with --pdf-engine=xelatex.

尝试2

按照这个建议,我使用以下命令:

Attempt 2

Following this suggestion, I use this command:

pandoc --pdf-engine=xelatex transcription.docx  -s -o transcript.pdf

Pandoc然后为文本中的每个高棉字符抛出一条错误消息:

Pandoc then throws an error message for every Khmer character in the text:

[WARNING] Missing character: There is no អ in font [lmroman10-bold]:mapping=tex-text;!
[WARNING] Missing character: There is no ្ in font [lmroman10-bold]:mapping=tex-text;!
[WARNING] Missing character: There is no ន in font [lmroman10-bold]:mapping=tex-text;!
...

通过此过程生成PDF(请参见 https://briancroxall.net/pandoc/transcript.pdf),但基本上是空的.

A PDF is produced by this process (see https://briancroxall.net/pandoc/transcript.pdf), but it is largely empty.

据我所知,这表明我试图用来进行转换的LaTeX引擎中没有高棉字符.是否如此,我如何成功管理此文件转换?

As best as I can tell, this suggests that Khmer characters are not being available in the LaTeX engine that I'm trying to use to do the conversion. Whether or not that is so, how can I manage this file conversion successfully?

推荐答案

$ pandoc --pdf-engine = xelatextranscription.docx \ -V'mainfont:高棉语MN'-s -o转录.pdf

这将生成带有高棉语字符且没有错误消息的PDF.

This produces a PDF with Khmer characters and no error messages.

PDF 确实似乎存在一些问题,因为高棉语中的某些短语超出了页面的边缘.我认为这是由于Word可以处理的分段问题,但在转换为PDF时变得一团糟.

The PDF does seem to have some issues in that some phrases in Khmer run off the margin of the page. I think this is due to segmentation issues that Word is equipped to deal with but that get messed up in conversion to PDF.

这篇关于如何解决“缺少字符"问题使用Pandoc和LaTeX从docx转换为pdf时出现警告?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆