寻找Linux PDF库以从PDF提取注释和图像 [英] Looking for a linux PDF library to extract annotations and images from a PDF
本文介绍了寻找Linux PDF库以从PDF提取注释和图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在寻找一个免费的库(Java/Ruby),该库可以在linux上运行,并且可以从PDF中提取图像和注释;与 CGPDFDocument 在OS X上可以执行的操作类似.
I'm looking for a free library (Java/Ruby), that can run on linux, and can extract images and annotations from PDFs; similar to what CGPDFDocument can do on OS X.
谢谢!
推荐答案
I don't know about images, but using the last version of the ruby pdfreader library I was able to succesfully extract the annotations from a big PDF file:
PDF::Reader.open(filename) do |reader|
reader.pages.each do |page|
annots_ref = page.attributes[:Annots]
actual_annots = reader.objects[annots_ref]
if actual_annots && actual_annots.size > 0
actual_annots.each do |annot_ref|
actual_annot = reader.objects[annot_ref]
unless actual_annot[:Contents].nil?
puts "Page #{page.number},"+actual_annot[:Contents].inspect
end
end
end
end
end
我想可以做类似的事情来提取图像.
I imagine that something like it could be done to extract images.
这篇关于寻找Linux PDF库以从PDF提取注释和图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文