在ASP.NET中将PDF转换为HTML(C#) [英] PDF to HTML conversion in asp.net(c#)

查看:96
本文介绍了在ASP.NET中将PDF转换为HTML(C#)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将pdf文件转换为html?我已经尝试了pdftohtml和pdf2text.我发现pdftohtml可以生成更好的html布局.我已经使用了命令

How to convert pdf files to html? I have tried pdftohtml and pdf2text. I found somewhere pdftohtml generates better html layout. I have used the command

pdftohtml  -c 1.pdf 2.htm



这会生成html文件,但布局与原始pdf不同.有人可以建议将pdf转换为html的更好解决方案吗?

注意:HTML文件中可能包含图像.



This generates html file but layout is not like the original pdf. Can anyone suggest better solution for pdf to html conversion?

Note: HTML file may have images.

推荐答案

此处存在一个基本问题:PDF本质上是一种输出格式,并且包含所有语义信息您可能想要在HTML中丢失(这是标头,是表,是表的子元素),只是丢失了.还有一个问题,PDF是基于页面的格式,而HTML不是基于页面的格式,因此任何形式的模糊保留布局的翻译都不会感觉像HTML页面一样.

对此的简短答案是:不要这样做.要么提供PDF服务(现代浏览器通常都知道如何处理),要么从生产链中更早的位置开始,即您最初用来生成PDF的任何内容.
There is a basic problem here: PDF is essentially an output format and all the semantic information you would want to be in your HTML (this is a header, that is a table, that is a subelement of that) is simply missing. There''s also the issue that PDF is a page-based format and HTML isn''t, so any sort of vaguely layout-preserving translation won''t feel right as a HTML page.

The short answer to this one is: don''t do it. Either serve up the PDF (modern browsers generally know how to deal with one), or start from something earlier in the production chain, i.e. whatever you used to generate the PDF in the first place.


您好

http://stackoverflow.com/questions/2295555/how- to-convert-pdf-into-html-using-c-sharp [
Hi

http://stackoverflow.com/questions/2295555/how-to-convert-pdf-into-html-using-c-sharp[^]


这篇关于在ASP.NET中将PDF转换为HTML(C#)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆