如何以编程方式基于视觉差异比较两个PDF? [英] How to compare two PDFs based on visual differences programmatically?

查看：84 发布时间：2020/9/4 23:06:04 java apache pdf pdfbox apache-tika

本文介绍了如何以编程方式基于视觉差异比较两个PDF?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要比较并获得两个PDF文件中的所有视觉差异.我知道在堆栈溢出时有一些与此相关的问题，但它们并不能满足我的需求.

I need to compare and get all the visual differences in the two PDF files. I know there are some questions related to this on stack overflow but they are not fulfilling my need.

我目前正在使用PDFBox为PDF页面生成图像并比较图像的字节.

I'm currently using PDFBox to generate images for pages in PDF and comparing the bytes of the images.

通过这种方法，我可以知道特定页面有所不同.

By this approach I'm able to know that particular page is differing.

但是我需要了解一些更详细的细节，例如某些文本的字体大小，例如-文本"的页码有所不同，例如PDF中的6.

But I need to find to know some more fine details such as font size of some text, for say - "The text" is differing in the page number, say 6 in the PDFs.

不仅要处理文本，而且还要注意所有视觉差异，例如图像，图表中的文本等.

Not only for text but I need to take care of all the visual differences such as images, text in the charts etc.

请以某种方式建议我实现这一目标.

Please suggest me someway to achieve this.

PS:我尝试使用Apache Tika，但是我感觉它可以用来获取XHTML和元数据中的结构化文本.但是我看到精细的细节，例如字体大小，字体八没有出现在结构化文本中.如果我弄错了，请纠正我.

PS: I tried using Apache Tika but I'm getting the sense that it could be used to get structured text in XHTML and metadata. But I'm seeing the fine details such as font size, font eight is not appearing in structured text. Please correct me if I'm getting it wrong.

如何以编程方式基于视觉差异比较两个PDF? [英] How to compare two PDFs based on visual differences programmatically?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何以编程方式基于视觉差异比较两个PDF? [英] How to compare two PDFs based on visual differences programmatically?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭