PDFKitten强调错误的立场 [英] PDFKitten is highlighting on wrong position

查看:117
本文介绍了PDFKitten强调错误的立场的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 PDFKitten 在PDF文档中搜索字符串并突出显示结果。 FastPDFKit或任何其他商业图书馆都没有选择,所以我坚持最接近我的要求。

I am using PDFKitten for searching strings within PDF documents with highlighting of the results. FastPDFKit or any other commercial library is no option so i sticked to the most close one for my requirements.

正如您在屏幕截图中看到的那样,我搜索的字符串in始终正确突出显示,最后一个。我得到了一个更复杂的PDF文档,其中in的突出显示框几乎有40%错误。

As you can see in the screenshot i searched for the string "in" which is always correctly highlighted except the last one. I got a more complex PDF document where the highlighted box for "in" is nearly 40% wrong.

我阅读了整个语法并检查了问题跟踪器,但除线高问题外,我没有发现任何关于宽度计算的问题。目前我没有看到任何模式的计算结果或可能是错误的,我希望也许其他人有我的问题。

I read the whole syntax and checked the issues tracker but except line height problems i found nothing regarding the width calculation. For the moment i dont see any pattern where the calculation goes or could be wrong and i hope that maybe someone else had a close problem to mine.

我目前的期望是在字体类或RenderingState.m中的某处计算坐标和字符宽度是错误的。该项目非常复杂,过去可能有人与PDFKitten有类似的问题。

My current expectation is that the coordinates and character width is wrong calculated somewhere in the font classes or RenderingState.m. The project is very complex and maybe someone of you had a similar problem with PDFKitten in the past.

我使用PDFKitten的原始样本PDF文档作为我的截图。

I have used the original sample PDF document from PDFKitten for my screenshot.

推荐答案

在计算字符标识符与其unicode字符代码不一致的字符宽度时,这可能是PDFKitten中的错误。

This might be a bug in PDFKitten when calculating the width of characters whose character identifier does not coincide with its unicode character code.

StringDetector中的appendPDFString与两个一起工作处理一些字符串数据时的字符串:

appendPDFString in StringDetector works with two strings when processing some string data:

// Use CID string for font-related computations.
NSString *cidString = [font stringWithPDFString:string];

// Use Unicode string to compare with user input.
NSString *unicodeString = [[font stringWithPDFString:string] lowercaseString];

Font中的stringWithPDFString将其参数的字符标识符序列转换为unicode字符串。

stringWithPDFString in Font transforms the sequence of character identifiers of its argument into a unicode string.

因此,尽管变量的名称,cidString不是字符标识符的序列,而是unicode字符。尽管如此,它的条目被用作didScanCharacter的参数,它在Scanner中实现了按字符宽度转发位置:它使用value作为Font中widthOfCharacter的参数来确定字符宽度,以及该方法(根据注释Width)给定字符(CID)缩放到fontsize)期望它的参数是一个字符标识符。

Thus, in spite of the name of the variable, cidString is not a sequence of character identifiers but instead of unicode chars. Nonetheless its entries are used as argument of didScanCharacter which in Scanner is implemented to forward the position by the character width: It is using the value as parameter of widthOfCharacter in Font to determine the character width, and that method (according to the comment "Width of the given character (CID) scaled to fontsize") expects its argument to be a character identifier.

因此,如果CID和unicode字符代码不一致,确定错误的字符宽度,并且不能信任任何后续字符的位置。在这种情况下,/ fi连字的CID为12,这与其Unicode代码0xfb01不同。

So, if CID and unicode character code don't coincide, the wrong character widths is determined and the position of any following character cannot be trusted. In the case at hand, the /fi ligature has a CID of 12 which is way different from its Unicode code 0xfb01.

我建议增强PDFKitten以定义StringDetector中的didScanCID方法,其中appendPDFString应该在didScanCharacter旁边为每个处理过的字符转发其CID调用。然后,扫描仪应该使用这种新方法来计算转发光标的宽度。

I would propose PDFKitten to be enhanced to also define a didScanCID method in StringDetector which in appendPDFString should be called next to didScanCharacter for each processed character forwarding its CID. Scanner then should make use of this new method instead to calculate the width to forward its cursor.

但是应该首先对它进行三重检查。也许有些widthOfCharacter实现(对于不同的字体类型有不同的实现)尽管注释期望参数毕竟是unicode代码...

This should be triple-checked first, though. Maybe some widthOfCharacter implementations (there are different ones for different font types) in spite of the comment expect the argument to be a unicode code after all...

(对不起,如果我在这里或那里使用了错误的词汇,我是'Java家伙......:))

(Sorry if I used the wrong vocabulary here or there, I'm a 'Java guy... :))

这篇关于PDFKitten强调错误的立场的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆