有没有任何Java库将PDF文件转换为HTML? [英] Is there any java library for converting document from pdf to html?

查看:680
本文介绍了有没有任何Java库将PDF文件转换为HTML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

解决方案

显然,这不是一件容易的事情,PDF格式更丰富比HTML的一个(加上你必须提取图像并链接它们等)。
简单的文本提取要简单得多(尽管不是微不足道的...)。
我在您的问题的侧边栏有一个类似的问题:>转换PDF到HTML的Python ,它指向一个库(poppler,显然是用C ++编写的,也许可以用JNI / JNA访问)以及一个相关的问题,它提供了更多的答案。


Open source implementation will be preferred.

解决方案

Obviously, it isn't an easy task, PDF formatting is much richer than HTML's one (plus you must extract images and link them, etc.).
Simple text extraction is much simpler (although not trivial...).
I see in the sidebar of your question a similar question: Converting PDF to HTML with Python which points to a library (poppler, which is apparently written in C++, perhaps can be accessed with JNI/JNA) and to a related question which offers even more answers.

这篇关于有没有任何Java库将PDF文件转换为HTML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆