libharu中的utf8:真的需要嵌入字体吗? [英] utf8 in libharu: is embedding fonts really necessary?

查看:145
本文介绍了libharu中的utf8:真的需要嵌入字体吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在编写的PDF文件中支持尽可能多的Unicode.我希望能够输出utf8字符串并使它们在PDF中正确显示.

I'm trying to support as much Unicode as I can in the PDF files I'm writing. I want to be able to output utf8 strings and have them display correctly in the PDF.

我在libharu编码文档中看到了( https://github.com/libharu/libharu/wiki/Encodings ),我可以访问很多单字节代码页,如果需要中文,日文和韩文,则可以使用特殊功能访问多字节代码页.但是我的理解是,如果我想使用所有这些页面和函数来编写任意的utf8字符串,那么我必须编写一堆代码将utf8字符串分解为每个都使用特定代码页的段,然后执行无需进行任何代码页交换,在输出之前,将我的每个段从utf8反向映射到给定的代码页.与只能说写这个utf8字符串"相比,这似乎是很多容易出错的工作.

I see in the libharu encodings documentation (https://github.com/libharu/libharu/wiki/Encodings) that there are many single-byte code pages I can access, and special functions for accessing multi-byte code pages if I want Chinese, Japanese, and Korean. But my understanding if that if I wanted to use all of those pages and functions to write arbitrary utf8 strings, I'd have to write a bunch of code to break my utf8 strings into segments that each use a specific code page, and then do whatever code page swapping is necessary, with reverse mapping of each of my segments from utf8 to the given code page before outputting it. That seems like a lot of error-prone work compared to just being able to say "write this utf8 string".

为了能够编写utf8字符串,我正在使用以下代码:

To be able to write utf8 strings I'm using this code:

myPdf = HPDF_New( PdfErrorHandler, NULL );
HPDF_UseUTFEncodings( myPdf );
HPDF_SetCurrentEncoder( myPdf, "UTF-8" );
const char *f = HPDF_LoadTTFontFromFile( myPdf, "path/to/verdana.ttf", HPDF_TRUE );
HPDF_Font myFont = HPDF_GetFont( myPdf, f, "UTF-8" );
... go on to use myFont to write various text strings

那行得通,我可以编写带有重音拉丁字符,西里尔字母和希腊字符的utf8字符串,并且它们可以在PDF中正确显示.

That works, and I can write utf8 strings with accented Latin characters, and Cyrillic and Greek characters, and they show correctly in the PDF.

但是,由于我使用了 HPDF_TRUE 将字体嵌入文件中,因此大大增加了文件的大小.实际上,我使用的是四种字体(verdana.ttf,verdanab.ttf,verdanai.ttf和verdanaz.ttf),与我使用内置" libharu时相比,它们增加了60万以上的文件大小.字体(文件很小,只有几千个).

However, because I used that HPDF_TRUE to embed the font in my file, it increases the size of my file significantly. I am in fact using four fonts (verdana.ttf, verdanab.ttf, verdanai.ttf, and verdanaz.ttf), and they add over 600k to my file size, as compared to when I was using the "built-in" libharu fonts (which leave the file tiny, just a few k).

(我曾尝试使用 HPDF_FALSE 不嵌入字体,但是随后我的文件使用随机的拉丁字符打开.)

(I did try using HPDF_FALSE to not embed the fonts, but then my files open with random Latin characters.)

我正在尝试从概念上理解为什么如果我使用的像verdana这样的字体无论如何都将要出现在最终用户的系统上,为什么必须在我的PDF中嵌入字体.(我什至不在乎它是否为verdana,任何标准的sans serif字体都可以.)我当然已经通过其他方式(例如,从Word导出)创建了许多包含希腊语,西里尔字母,中文和其他字符的PDF文件.,但它们很小.那么,嵌入使用utf8要求只是libharu的怪癖吗?

I'm trying to understand conceptually why it's necessary to embed fonts in my PDF, if I'm using a font like verdana that is going to be on the end user's system anyway. (I don't even care if it's verdana -- any standard sans serif font would do.) I've certainly created lots of PDF files by other means (e.g., exporting from Word) containing Greek, Cyrillic, Chinese, and other characters, and yet they are small. So is this embedding-to-use-utf8 requirement just a quirk of libharu?

此外,即使有60万个批量文件,我使用libharu制作的文件也将汉字显示为块.我在libharu文档页面上看到,libharu仅支持一字节和两字节的utf8序列,该序列包括除中文,日文和韩文之外的大多数内容.那么这是否意味着我要嵌入verdana.ttf,其中大多数是中文,日文和韩文字形,而我什至无法访问它们?

Plus, even with that 600k bulk my files made with libharu show Chinese characters as blocks. I read on a libharu documentation page that libharu only supports one and two-byte utf8 sequences, which includes most everything except Chinese, Japanese, and Korean. So does this mean I'm embedding verdana.ttf, the majority of which is Chinese, Japanese, and Korean glyphs, and I can't even access them?

无论如何,中文,日文和韩文对于我当前的应用程序并不重要,但是对于两个字节的utf8序列,我试图了解是否有办法让我在libharu中使用它们而不必在我的文件中嵌入大字体.

In any case, Chinese, Japanese, and Korean are not important for my current application, but just for the two-byte utf8 sequences I'm trying to understand if there's a way for me to use them in libharu without having to embed big fonts in my file.

推荐答案

对于PDF规范,如果您不嵌入字体,则合格的阅读器将尝试从用户系统中加载相同的字体.

For PDF specification, if you do not embed a font, then a conforming reader will try to load the same font from the user's system.

如果未找到,则回退并尝试显示具有另一种字体的字符.如果替换字体在编码位置没有对应的字符,则该位置将出现不可预测的字符.

if not found, then it falls back and try to display the character with another font. If the replacement font does not have a corresponding character in the encoding position, then an unpredictable character at this position will appear.

通常建议嵌入一个子集,除非您希望允许用户编辑您的文档,这对于PDF文档来说是一种罕见的用例.

It is always recommended to embed a subset, unless you want to allow users to edit your document, which is a rare use case for PDF docs.

这篇关于libharu中的utf8:真的需要嵌入字体吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆