如何在itextSharp中检测表启动? [英] How to Detect table start in itextSharp?

查看:114
本文介绍了如何在itextSharp中检测表启动?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将pdf转换为csv文件。 pdf文件具有表格格式的数据,第一行作为标题。我已达到可以从单元格中提取文本的级别,比较表格中的文本基线并检测换行符,但我需要比较表格边框以检测表格的开始。我不知道如何检测和比较PDF中的行。任何人都可以帮助我吗?

I am trying to convert pdf to csv file. pdf file has data in tabular format with first row as header. I have reached to the level where I can extract text from a cell, compare the baseline of text in table and detect newline but I need to compare table borders to detect start of table. I do not know how to detect and compare lines in PDF. Can anyone help me?

谢谢!!!

推荐答案

As你已经看过(希望如此),PDF没有表格的概念,只是放置在特定位置的文字和围绕它们绘制的线条。文本和行之间没有内部关系。这一点非常重要。

As you've seen (hopefully), PDFs have no concept of tables, just text placed at specific locations and lines drawn around them. There is no internal relationship between the text and the lines. This is very important to understand.

知道这一点,如果所有单元格都有足够的填充,你可以找到足够大的字符之间的间隙,例如宽度为3或更多空间。如果单元格没有足够的间距,那么很遗憾可能会破坏。

Knowing this, if all of the cells have enough padding you can look for gaps between characters that are large enough such as the width of 3 or more spaces. If the cells don't have enough spacing this will unfortunately probably break.

您还可以查看PDF中的每一行并尝试找出代表表的内容像线条。请参阅此答案,了解如何遍历页面上的每个标记以查看正在绘制的内容。

You could also look at every line in the PDF and try to figure out what represents your "table-like" lines. See this answer for how to walk every token on a page to see what's being drawn.

这篇关于如何在itextSharp中检测表启动?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆