从PDF转换为HTML [英] Converting from PDF to HTML

查看:126
本文介绍了从PDF转换为HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用.dll将PDF文件用作输入,将HTML文件用作输出?
我想从PDF转换为HTML。我的同事说,要逐步从PDF获取文本/字体/图像/边距/链接等,然后创建具有相同内容的新HTML文件是非常困难的。他说这几乎是不可能的。所以我在想-是否有一些dll可以用作执行该操作的参考?

Is there a .dll I can use which uses a PDF file as an input and HTML file as an output? I want to convert from PDF to HTML. My colleague says that it's very difficult going step by step, getting text/font/image/margins/links etc. from PDF and then creating new HTML file with the same content. He says it's nearly impossible. So I was thinking - if there's some dll which I can use as a reference to do that?

推荐答案

编写程序来做这绝对不是小事。如果找不到任何.NET库来执行此操作(我不能,至少不是免费的),我会下载此并以编程方式调用它以获取我的html。

Writing a program to do it is definitely not trivial. If you don't find any .NET Library to do this (I couldn't, at least not free), I would just download this and invoke it programmatically to get my html.

如果您有时间节省和/或PDFToHtml无法为您提供可接受的输出,则可以使用iText 自己编写程序。这是一个非常成熟的免费pdf库。我过去曾用它来处理PDF(合并,创建等)。

If you have the time to spare and/or PDFToHtml does not produce acceptable output for you, you could use iText to write the program yourself. It's a very mature free pdf library. I've used it in the past to manipulate PDFs (merge, create, etc).

更新

如Quandary的评论所述, PDFSharp 库提供了更为宽松的许可证( MIT)与iText提供的Commercial或AGPL许可相比。选择库时请记住这一点。我自己还没有使用过PDFSharp库,也不知道它们在功能方面的比较。

As noted in the comment by Quandary, the PDFSharp library offers a more relaxed license (MIT) compared to the Commercial or AGPL license offered by iText. Keep this is mind when choosing your library. I have not used the PDFSharp library myself and I don't know how they compare in terms of functionality.

这篇关于从PDF转换为HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆