如何以编程方式基于视觉差异比较两个 PDF? [英] How to compare two PDFs based on visual differences programmatically?

查看：24 发布时间：2021/11/14 23:47:19 java apache pdf pdfbox apache-tika

本文介绍了如何以编程方式基于视觉差异比较两个 PDF?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要比较并获取两个 PDF 文件中的所有视觉差异.我知道在堆栈溢出时有一些与此相关的问题，但它们不能满足我的需要.

I need to compare and get all the visual differences in the two PDF files. I know there are some questions related to this on stack overflow but they are not fulfilling my need.

我目前正在使用 PDFBox 为 PDF 中的页面生成图像并比较图像的字节数.

I'm currently using PDFBox to generate images for pages in PDF and comparing the bytes of the images.

通过这种方法，我能够知道特定页面是不同的.

By this approach I'm able to know that particular page is differing.

但我需要了解一些更精细的细节，例如某些文本的字体大小，例如 - 文本"的页码不同，例如 PDF 中的 6.

But I need to find to know some more fine details such as font size of some text, for say - "The text" is differing in the page number, say 6 in the PDFs.

不仅是文本，我还需要处理所有视觉差异，例如图像、图表中的文本等.

Not only for text but I need to take care of all the visual differences such as images, text in the charts etc.

请建议我以某种方式实现这一目标.

Please suggest me someway to achieve this.

PS:我尝试使用 Apache Tika，但我感觉它可以用于获取 XHTML 和元数据中的结构化文本.但是我看到诸如字体大小、字体 8 之类的细节没有出现在结构化文本中.如果我弄错了，请纠正我.

PS: I tried using Apache Tika but I'm getting the sense that it could be used to get structured text in XHTML and metadata. But I'm seeing the fine details such as font size, font eight is not appearing in structured text. Please correct me if I'm getting it wrong.

如何以编程方式基于视觉差异比较两个 PDF? [英] How to compare two PDFs based on visual differences programmatically?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何以编程方式基于视觉差异比较两个 PDF? [英] How to compare two PDFs based on visual differences programmatically?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭