Ghostscript PDF转PDF / A转换字体的问题 [英] Ghostscript PDF to PDF/A conversion font issues

查看:725
本文介绍了Ghostscript PDF转PDF / A转换字体的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在探索将PDF文档转换为PDF / A的工具。 Ghostscript似乎给出了这种转换的支持。一个问题似乎是作为原始PDF文档的一部分的一些真正的字体没有正确转换。如果我从转换的PDF / A文档中复制文本,并将其粘贴到记事本中,则复制的文本显示为乱码文本。



原始文件的文本可以复制到记事本就好了。

我正在使用以下脚本: / p>

  gswin64 -dPDFA -dBATCH -dNOPAUSE -dUseCIEColor -sProcessColorModel = DeviceCMYK -sDEVICE = pdfwrite -sPDFACompatibilityPolicy = 1 -sOutputFile = FilteredOutput.pdf我已经在Google云端硬盘中上传了一个样本1页的源PDF:





<

p>从这个命令生成的样例输出PDF / A文档位于Google驱动器中:
SampleOutput



在这个PDF格式的Windows机器上运行上面的查询会重现这个问题。



是否有任何设置/命令可以正确处理PDF / A转换?

从PDF复制和粘贴不能保证。子集字体不是有一个可用的编码(如ASCII或UTF-8),在这种情况下,他们将只能剪裁/粘贴/搜索,如果他们有一个关联ToUnicode CMap,许多PDF文件不包含ToUnicode CMaps。



当然,PDF / A规范(在我看来奇怪地说)不应该使用子集字体,但并不总是可能的判断一个字体是否是子集(并非所有的创建者遵循XXXXX +约定),即使字体不是子集,那么 still 也不能保证它的编码是可用的。 p>

查看你发布的文件,它没有包含它使用的字体(Arial,Bold),所以Ghostscript用DroidSansFallback替代,并且它包含的字体FreeSansBold)是一个子集(FWIW这个字体实际上似乎没有被使用....)。后备字体是一个CIDFont,所以没有真正的文字正确的前景。

我相信,如果你制作一个真正的字体可供Ghostscript取代Arial,Bold然后它可能会正常工作。这也将解决更为明显的字符间距不正确的问题(在一个地方,疯狂地不正确),这是由于回退字体与原来的宽度不同而造成的。



NB因为警告信息已经告诉你不要使用-dUseCIEColor。



事实上你不能复制/粘贴/搜索PDF并不意味着它不是一个有效的PDF / A-1b文件,所以这并不意味着PDF / A-1b的创建( NOT 转换)不是'正确'。


I am exploring tools to convert PDF documents to PDF/A. Ghostscript seems to give out of the box support for such a conversion. One issue seems to be that some true type fonts that are a part of the original PDF document are not converted correctly. If I copy a text from the converted PDF/A document, and paste it in notepad, the copied text appears to be garbled text.

The original document text can be copied to notepad just fine.

I am using the following script:

gswin64 -dPDFA -dBATCH -dNOPAUSE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=FilteredOutput.pdf Filtered1Page.pdf

I have uploaded a sample 1 page source PDF in Google Drive: SampleInput

A sample output PDF/A document generated from the command is in Google drive here: SampleOutput

Running the above query on this PDF in a windows machine will reproduce the issue.

Are there any settings / commands make the PDF/A conversion to be handled properly?

解决方案

Copy and paste from a PDF is not guaranteed. Subset fonts will not have a usable Encoding (such as ASCII or UTF-8), in which case they will only be amenable to cut/paste/search if they have an associated ToUnicode CMap, many PDF files do not contain ToUnicode CMaps.

Of course, the PDF/A specification states (oddly in my opinion) that you should not use subset fonts, but its not always possible to tell whether a font is subset (not all creators follow the XXXXX+ convention), and even if the font isn't subset there still isn't any guarantee that its Encoding is one that is usable.

Looking at the file you have posted, it does not contain one of the fonts it uses (Arial,Bold) and so Ghostscript substitutes with DroidSansFallback, and the font it does contain (FreeSansBold) is a subset (FWIW this font doesn't actually seem to be used....). The fallback font is a CIDFont, so there is no real prospect of the text being 'correct'.

I believe that if you make a real font available to Ghostscript to replace Arial,Bold then it will probably work correctly. This would also fix the rather more obvious problem of the spacing of the characters being incorrect (in one place, wildly incorrect), which is caused by the fallback font having different widths to the original.

NB as the warning messages have already told you don't use -dUseCIEColor.

The fact that you cannot copy/paste/search a PDF does not mean that it is not a valid PDF/A-1b file though, so thsi does not mean that the creation (NOT conversion) of the PDF/A-1b is not 'proper'.

这篇关于Ghostscript PDF转PDF / A转换字体的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆