使用iTextSharp检查PDF文档中的一段文本是否为粗体的方法是什么 [英] What are the ways of checking if piece of text in PDF documernt is bold using iTextSharp

查看:38
本文介绍了使用iTextSharp检查PDF文档中的一段文本是否为粗体的方法是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个应用程序,可以从 pdf 文件中提取标题.应用程序应该使用的文档都具有或多或少连贯的结构和格式,实际上,判断文本块是否为粗体非常重要.最近我遇到了一堆文件,其中一些块在视觉上显示为粗体,但在字体的字符串表示中没有粗体"部分.以下 SO 线程 如何使用 iTextSharp 获取文本格式帮助我理解,还有另一种使文本显得粗体的方法.但是,在我的情况下,调用 GetTextRenderMode() 也无济于事,因为它返回 0,就好像它是普通文本一样.那么有没有其他方法可以使文本显示为粗体,是否可以使用 iTextSharp 检测它?

I have an application, that extracts headings out of pdf files. The documents, that the application is supposed to work with, all have more or less coherent structure and formatting, in fact, telling if a text chunk is bold or not, is very important. Recently I came across a bunch of files, where some chunks visually appear bold, but do not have "bold" piece in string representation of font. The following SO thread how can i get text formatting with iTextSharp helped me to understand, that there is one more way of making text appear bold. However in my case calling GetTextRenderMode() does not help either, as it returns 0 as if it were normal text. So are there any other ways of making text appear bold, and is it possible to detect it using iTextSharp ?

推荐答案

您假设 PDF 文件中的字体知道它是否为粗体.让我们看看里面,看看你的假设是否正确.

You are making the assumption that the font inside your PDF file knows if it's bold or not. Let's take a look inside and check if your assumption is correct.

当您查看共享的 PDF 文件的内部结构时,这是字体 TT116t00 的子集 JOJJAH 的样子:

This is what the subset JOJJAH of the font TT116t00 looks like when you look at the internals of the PDF file you have shared:

我们看到字体是subtye /TrueType,我们看到/ItalicAngle 是0,我们看到/ItalicAngle 的第3 位代码>/Flags 已设置.让我们检查一下 PDF 参考,看看它告诉我们什么:

We see that the font is of subtye /TrueType, we see that the /ItalicAngle is 0, and... we see that the 3rd bit of the /Flags is set. Let's check the PDF reference to find out what this tells us:

我引用:

字体包含 Adob​​e 标准拉丁字符集之外的字形.

The font contains glyphs outside the Adobe standard Latin character set.

字形看起来粗体,因为字形的绘制方式使它们显得粗体.您看到字体为粗体,因为您是人类.但是,当机器查看字体时,它不知道字体是粗体的.机器只遵循存储在 /FontFile2 流中的指令.

The glyphs look bold, because the glyphs are drawn in a way that they appear bold. You see the font as bold because you are human. However, when a machine looks at the font, it doesn't have a clue that the font is bold. A machine just follows the instructions stored in the /FontFile2 stream.

简而言之:iTextSharp 没有任何迹象表明字体是粗体.

In short: iTextSharp doesn't have any indications that the font is bold.

这篇关于使用iTextSharp检查PDF文档中的一段文本是否为粗体的方法是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆