将 PDF 转换为 HTML，保持布局 [英] Transform PDF to HTML, keep layout

查看：28 发布时间：2021/9/23 20:22:55 html pdf

本文介绍了将 PDF 转换为 HTML，保持布局的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有哪些方法可以将 PDF 转换为 HTML?它可以是任何东西——在线服务、软件、图书馆.(首选开源.在最后一种情况下，首选 php 或 python.)它必须保留原始布局(包括页码、脚注等)，保留图像(可以将它们组合为每页一个背景图像)并保留链接.它最好输出有效的 XHTML 并清理 PDF 功能，例如连字，但如果需要进行一些后期处理，我可以接受.带有干净、相对语义化的 HTML 输出的东西会很棒.

What methods are there to transform a PDF to HTML? It could be anything - online service, software, library. (Opensource preferred. In the last case, php or python would be preferred.) It has to keep the original layout (including page numbers, footnotes and such), keep the images (combining them to one single background image per page is acceptable) and keep the links. It should preferably output valid XHTML and clean up PDF features such as ligatures, but if there is some post-processing required, I can live with that. Something with a clean, relatively semantic HTML output would be great.

我找到的最接近的一个是 zamzar.org，但它被链接阻塞了.(此外，HTML 输出是一堆丑陋的绝对定位 div，由于编码问题需要进行后期处理.)

The closest one I found was zamzar.org, but it choked on links. (Also, the HTML output is an ugly heap of absolutely positioned divs and needs post-processing because of encoding problems.)

将 PDF 转换为 HTML，保持布局 [英] Transform PDF to HTML, keep layout

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

将 PDF 转换为 HTML，保持布局 [英] Transform PDF to HTML, keep layout

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭