在PHP中将PDF转换为HTML,类似于DocuSign [英] Convert PDF to HTML in PHP similar to DocuSign

查看:160
本文介绍了在PHP中将PDF转换为HTML,类似于DocuSign的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在开发一个网站,该网站需要将PDF文件转换为HTML,因为某些PDF具有格式(不一定是可填充的PDF,这些PDF会被打印以填充).

we are developing a website that needs to convert PDF files into HTML because some of the PDF has a form (not necessarily fillable PDF, these PDFs are printed to be filled up).

因此,我们希望它可以通过我们的网站填写,而不是打印文件并用笔填写.我们将无纸化.

So we want it to be filled up through our website instead of printing the files and filled up by pen. We are going paperless.

DocuSign提供了这些内容,您可以在其中上传PDF,然后可以对其进行自定义以包含文本框,复选框.因此,我们有点以DocuSign作为参考,但仍未弄清楚他们是如何做到的(将PDF完美转换为HTML几乎是相反的方式).

DocuSign provides these wherein you can upload PDF, then you can customized it to have textboxes, checkbox. So we're kinda using DocuSign as a reference but still haven't figured out how they did it (Almost perfect convertion of PDF to HTML vice-versa).

到目前为止,我已经尝试了几种第三方软件来将PDF转换为HTML.我已经尝试过XPDF,Poppler和& ImageMagick.

So far I've tried several 3rd party softwares for converting PDF to HTML. I've tried XPDF, Poppler, & ImageMagick.

ImageMagick将PDF转换为不合适的图像,因为这些图像在转换回PDF进行打印时尺寸较大.

ImageMagick converts a PDF to an image which is not suitable as these images has a large size when converted back to a PDF for printing.

Poppler是基于我的研究得出的XPDF叉,我在使用XPDF以后看过它是否更好,它基本上做了XPDF所做的事,但是当转换为HTML时,它将PDF转换为CSS上具有更大的像素.很好,但是它失去了字体系列.

Poppler is a fork XPDF based on my research, I've tried it after using XPDF to see if it's better, it basically does what XPDF do but it converts the PDF to have bigger pixels on the CSS when converted to HTML. That's fine but it loses the font family.

XPDF将PDF转换为HTML,但像素较小,因此当我将其转换回PDF时,它不能适合整个页面,并且我仍然必须手动调整所有CSS以适合它.

XPDF converts PDF to HTML but the pixel is smaller, so when I convert it back to PDF, it does not fit the whole page, and I still have to manually adjust all the CSS to fit it.

因此,在使用了这些第三方软件之后,我使用MPDF将HTML文件转换回PDF,并且转换后的文件有很多不一致之处.文本未正确对齐.它基本上与原始PDF不同.

So after using these 3rd party softwares, I convert back the HTML files into PDF using MPDF, and the converted files has so much inconsistencies. Texts are not aligned properly. It's basically not the same as the original PDF.

任何帮助将不胜感激!

推荐答案

您尝试做的事情似乎并不那么直接.我与Adobe Sign(以前称为EchoSign)合作了多年,对于这些服务的工作方式我有一个很好的主意.话虽如此,我强烈建议您研究其中一种eSign服务,而不要尝试推出自己的服务.这样可以节省您很多时间.

What you are trying to do is not as straight forward it may seem. I have worked with Adobe Sign, formerly known as EchoSign, for years and I have a pretty good idea on how these services work. With that been said I strongly suggest looking into one of these eSign services instead of trying to roll out your own. It will save you a lot of time.

这就是它的工作方式

  1. PDF本身必须具有带有命名字段的表单.换句话说,如果您在Adobe Reader或Chrome中打开此类PDF,则应该可以填写字段.如果您的PDF没有PDF表单,则需要其他软件(例如Acrobat PRO)来创建表单.
  2. 您必须将PDF转换为可以在浏览器中呈现的平面图像.
  3. 您将需要一个工具来提取PDF表单信息,例如字段名称,类型,尺寸和坐标.
  4. 有了所有这些信息,您就可以在浏览器中渲染PDF图像.使用上一步中的字段类型,尺寸和坐标将绝对定位的HTML表单元素放置在图像上.每个HTML元素都需要按名称引用PDF表单字段.
  5. 一旦您从HTML窗口小部件中收集了信息和数据图(如field_name => field_value),您将需要使用其他软件以编程方式在原始PDF中填写PDF表单. PDF表单信息通常存储在FDF或XFDF文件中.
  1. The PDF must have a form itself with named fields. In other words, if you open such PDF in Adobe Reader or Chrome you should be able to fill in the fields. If your PDF does not have a PDF form you will need additional software like Acrobat PRO to create the form.
  2. You must convert the PDF into a flat image that can be rendered in the browser.
  3. You will need a tool to extract the PDF Form information, such as the field names, types, dimensions, and coordinates.
  4. With all this information you can then render the PDF image(s) in the browser. Place absolute positioned HTML form elements over the image using the field type, dimensions, and coordinates from the previous step. Each HTML element needs to reference a PDF form field by name.
  5. Once you have collected the information and a data map like field_name => field_value from your HTML widget, you will need to use additional software to programmatically fill in the PDF form in the original PDF. A PDF form information is often stored in FDF or XFDF file.

我不知道有哪个工具可以帮助您解决上述问题,至少在PHP中没有.但是,我可以为您提供建议可能会有所帮助:

I don't know of a single tool that will help you with the things outlined above, at least not in PHP. However, I can provide you with a suggestion can be helpful:

  • PDFtk服务器-可以帮助您同时提取PDF表单字段信息并填写相同的XFDF文件.不幸的是,您可以使用此类工具提取的表单字段信息不包含尺寸和坐标.
  • iText -.Net和可用于提取有关PDF表单的详细信息的Java,包括字段的尺寸和坐标.您可以使用此工具包创建可与PHP通信的微服务.
  • PDFtk Server - Can help you to both, extract the PDF form fields information and fill in the same an XFDF file. Unforutently, the form field information that you can extract with such tool does not include dimensions and coordinates.
  • iText - A library available in .Net and Java that can be used to extract detailed information about the PDF form including the dimension and coordinates of the fields. You can create microservice using this toolkit that can communicate with PHP.

肯定有很多工具可以完成这项工作.希望这些信息将指导您正确的方向,或帮助您决定如何继续进行项目.

There are definitely a lot more tools out there for the job. Hopefully, this information will guide you in the right direction or help you make a decision on how to move forward with your project.

这篇关于在PHP中将PDF转换为HTML,类似于DocuSign的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆