使用Camelot查找PDF尺寸 [英] Find PDF Dimensions with Camelot

查看:418
本文介绍了使用Camelot查找PDF尺寸的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Camelot读取完整的PDF,并从每个PDF中提取大约112个属性.

I am using Camelot to read complete PDFs and extract about 112 attributes from each one.

我使用表格区域提取属性

I use table areas to extract the attributes

 test_variable = camelot.read_pdf(filename, flavor='stream', 
                 table_areas=['38, 340 ,50, 328']) 

问题是在所有文档中,同一属性的表区域不是恒定的.有时,我会在另一个文档中的x或y坐标上找到同一属性,仅向下几个像素.

The issue is the table area is not constant for the same attribute across all documents. Sometimes I would find the same attribute a few pixels down in x or y-coordinates i another document.

 test_variable = camelot.read_pdf(filename, flavor='stream', 
                 table_areas=['38,350,50,338']) 

是否有一种方法可以从同一区域获取确切的属性,而与提取任何文档无关?

Is there a way to get the exact attribute from the same area regardless of extraction of any document?

推荐答案

也许table_regions选项(在0.7中引入)可以为您提供帮助.

Maybe the option table_regions (introduced in 0.7) can help you.

https://camelot-py .readthedocs.io/en/master/user/advanced.html#specify-table-regions

指定table_regions时,Camelot将仅分析指定的区域以查找表."

"When table_regions is specified, Camelot will only analyze the specified regions to look for tables."

您可以定义一个较大的table_regions区域,而Camelot将在该区域中搜索表.

You can define a larger table_regions area and Camelot will search for tables in this area.

这篇关于使用Camelot查找PDF尺寸的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆