GetBaseFont() 在 pdfbox 中等于 null [英] GetBaseFont() equal null in pdfbox

查看:69
本文介绍了GetBaseFont() 在 pdfbox 中等于 null的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 pdfbox 从 pdf 文件中提取文本,当我为 pdf 中的某些文本获取字体时,它会为空,我不知道为什么!虽然同一个文件中的一些其他文本我得到了它的字体.

I extract text from pdf file using pdfbox,when I get font for some text in pdf it get null i don't why! although some another text in the same file i get its font.

使用此代码:

 protected void processTextPosition(TextPosition text) {
  String font=text.getFont().getBaseFont(); // equal null

 }

推荐答案

String font=text.getFont().getBaseFont(); // equal null

PDFont.getBaseFont 用于简单地返回相应字体字典的 BaseFont 条目的值.

PDFont.getBaseFont is implemented to simply return the value of the BaseFont entry of the respective font dictionary.

不过,并非所有字体都在其字体字典中提供 BaseFont 条目.在这种情况下,消息将返回 null.

Not all fonts provide a BaseFont entry in their font dictionary, though. In such a case that message will return null.

根据 PDF 规范,如果字体是 Type0(复合)、Type1TrueType,则您只能期望字体具有该条目字体.如果他们是Type3,他们就没有那个条目.

According to the PDF specification you can only expect fonts to have that entry if they are Type0 (composite), Type1, or TrueType fonts. If they are Type3, they don't have that entry.

这实际上是有道理的:Type3 字体是纯粹的 PDF 内容,直到它们的字形定义;因此,没有要考虑的基本字体.

This actually makes sense: Type3 fonts are pure PDF stuff down to their glyph definitions; thus, there is no base font to consider.

对于Type0(复合)字体,您实际上可能会考虑查看后代字体(使用PDType0Font.getDescendantFont())并检查其BaseFont 条目,因为复合字体的条目被指定为后代的基本字体名称和 CMap 名称的组合.

In case of Type0 (composite) fonts you might actually consider looking at the descendant font (using PDType0Font.getDescendantFont()) and inspecting its BaseFont entry because the entry of the composite font is specified as a composition of the descendant's base font name and a CMap name.

虽然上述所有内容都适用于遵循规范的 PDF,但您必须习惯于在野外看到不 100% 遵循规范的 PDF.由于基本字体条目对于一般的 PDF 处理并不总是绝对必要的,因此在这种情况下,肯定有一些 PDF 不提供基本字体条目.

And while all of the above is true for PDF following the specification, you have to get used to seeing PDFs in the wild which do not follow the spec 100%. As the base font entry is not always strictly necessary for PDF handling in general, there surely are PDFs in the wild which don't provide the base font entry in such cases.

因此,这里总是考虑 null 值(或不符合规范的值).

Thus, always reckon with null values (or values not following the spec) here.

这篇关于GetBaseFont() 在 pdfbox 中等于 null的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆