在浏览器中编辑*现有* PDF [英] Edit *existing* PDF in a browser

查看:131
本文介绍了在浏览器中编辑*现有* PDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Web应用程序,当前从服务器获取PDF的base64表示。我可以使用Mozilla的pdf.js在< canvas> 上显示,并通过下拉列表切换页面。

I have a web application that is currently getting a base64 representation of a PDF from the server. I'm able to use Mozilla's pdf.js to display this on a <canvas> and toggle through the pages with a dropdown.

根据我能找到的所有内容和 Can Mozilla的pdf .js修改PDF?,无法用pdf.js编辑PDF。

According to everything I've been able to find and Can Mozilla's pdf.js modify PDFs?, it's not possible to edit the PDF with pdf.js.

我发现 jsPDF ,虽然我可以拿画布并执行 .toDataURL()它为每个页面并使用它构建一个新的PDF文档,但有两个问题:

I've found jsPDF and while I'm able to take the canvas and do a .toDataURL() with it for each page and build a new PDF document with it, but there are two issues:


  1. 新生成的PDF将只是一个系列每页上的图像,所以原来PDF中的任何文字都只是我完成后的图像。

  2. 我用jsPDF生成一个新的PDF然后发送base64的它回到pdf.js以在画布上显示它。在这些步骤之间发生了一些事情,其中​​页面的图像缩放不正确,因此每次新的PDF更改后,每个页面占据画布的大约3/4。我一直无法保持相同的尺寸/比例。

jsPDF看起来没有办法加载现有的PDF,它只会创建新的PDF。 pdfmake PDFKit 看起来也只是创建新的PDF文件。

jsPDF doesn't look like it has a way to load an existing PDF, it only creates new ones. pdfmake and PDFKit also look like they only create new PDF files.

所以我的问题:

是否有任何内容可以同时查看pdf(来自base64)并对其进行更改?
理想情况下,我会关注画布的更改,然后将该更改绘制到pdf页面上。完成后,将其导出到base64字符串以发送回服务器。

Is there anything that will allow for both viewing a pdf (from base64) and for making changes to it? Ideally I'd watch for changes to the canvas, then draw that change onto the pdf page. When done, export that to a base64 string to send back to the server.

推荐答案

快速回答 - 不,并且不太可能你会发现一个跨浏览器的解决方案。您不太可能找到PDF完美的解决方案。最好考虑让用户编辑HTML并在服务器上生成PDF。

Quick answer - no and it is quite unlikely you will find a cross-browser solution. It is very unlikely that you will find a PDF-perfect solution. Better to think about having the users edit HTML and generate the PDF at the server.

为什么 - PDF格式既同时又精彩和恶魔。由于其便携性而非常出色,但由于内部结构和存储机制的恶魔。 HTML没有友好的DOM。如果我们重新开始开发便携式文档格式,那么我们不会选择PDF格式。但PDF目前有太大的动力被抛弃,期间。

Why - the PDF format is both brilliant and fiendish at the same time. Brilliant because of its portability, but fiendish because of the internal structure and storage mechanisms. There is no friendly 'DOM' like with HTML. If we were starting out afresh to develop a portable document format it would not be PDF that we would choose. But PDF currently has too much momentum to be thrown away, period.

年轻的观众可能想知道这种狂躁的格式如何进入市场领先地位以及它来自哪里。好吧,当PDF的创始人在XML,JSON,HTML甚至互联网之前放下设计时,他们并没有考虑今天的文档共享。他们正在研究一种更好的编码打印指令的方法 - PostScript打印机驱动程序概念。在打印机消耗它们之前,从未期望对它们进行编辑,并且它们对任何其他目的都毫无价值。然后有人注意到你可以将PostScript绘图指令解释到屏幕上,随后有人发现将这作为可移动的跨设备显示概念的奇妙潜力。我们在这里。

Younger viewers might be wondering how the hell this manic format got into its market leading position and where it came from. Well, when the founding fathers of PDF were laying down the design, before XML, JSON, HTML and even the Internet, they weren't working with today's document sharing in mind. They were working on a better way to encode printing instructions - the PostScript printer driver concept. These were never expected to be edited before the printer consumed them, and they were worthless for any other purpose. Then someone noticed the you could interpret the PostScript drawing instructions to a screen, and subsequently someone spotted the fantastic potential to employ this as a transportable, cross device display concept. And here we are.

回到问题 - 要以任何有意义的GUI方式编辑PDF,您需要解压缩PDF并渲染组件(图像,格式化)文本,页面)到显示设备;然后让人们搞乱布局;然后重新打包PDF。您必须完全符合PDF标准,否则您可能会发现已编辑的PDF文件的下游消费者崩溃或无法呈现它。您必须满足各种Acrobat标准级别,以及编辑包(Word,Illustrator,InDesign)供应商插入PDF文件的快捷方式和膨胀;图层,缩略图等。

Back to the question - to edit a PDF in any meaningful GUI way, you would need to unpack the PDF and render the components (images, formatted text, pages) to the display device; then allow folks to mess with the layout; then re-pack the PDF. You would have to do this perfectly in line with the PDF standards otherwise you may find the downstream consumers of your edited PDF file crash or are unable to render it. You would have to cater for the various Acrobat standard levels, and the shortcuts and bloats that the editing package (Word, Illustrator, InDesign) vendors chuck into the PDF file; layers, thumbnails, etc.

然后我们来看颜色。阅读PDF规范,您将看到原始PDF制作人可以决定使用的一系列色彩空间选项。您必须将这些解释为合理的设备颜色在屏幕上和背面等。

Then we come to colors. Have a read of the PDF spec and you will see that there are an array of colorspace options that the original PDF producer can decide to use. You would have to interpret these to a reasonable device color on the screen and back, etc.

然后是字体。字体可能是嵌入子集,也可能不是。为了保持PDF的逼真度,您需要在PDF中定义的比例下将字形实现为绘图表面上的矢量图形。这主要意味着利用某种平台相关类型库 - 棘手的跨平台。此外,您需要为正确使用的字体授权,这对于大多数人想要用来看起来时髦和专业的字体来说是昂贵的。

And then fonts. Fonts might be embedded subset, or not. To keep fidelity with the PDF you will need to realise the glyphs as vector graphics on your drawing surface at the scale defined in the PDF. This mostly means utilising some kind of platform-dependant type library - tricky cross-platform. Plus the fact that you will need to licence the fonts for appropriate use which can be pricey for the fonts most people want to use to look hip and professional.

给定分层在PDF中缩放和旋转功能,您可能会将html画布视为绘图表面。任何知道的人都会告诉你,在画布世界中,你几乎都是自己的文字处理类型函数。

Given the layering, scaling and rotating features in PDF, you would likely be looking at an html canvas as the drawing surface. Anyone who knows will tell you that in the world of canvas you are pretty much on your own for word-processing type functions.

并非不可能,但很难。

将PDF呈现给显示器的组件主要充当打印驱动程序,遵守PDF绘图说明,通常生成栅格或有时生成SVG图形。这是一条单行道 - 他们阅读和绘画,但对绘制的对象没有手柄的感觉。没有句柄意味着没有操纵,这些人肯定没有意图让你修改和回写。

Components that render PDF to a display are largely acting as print drivers, slavishly obeying the PDF drawing instructions, and usually generating a raster or sometimes an SVG graphic. This is a one-way street - they read and draw, but there is no sense of 'handles' to the objects drawn. No handles means no manipulation, and these guys certainly have little intention of letting you modify and write back.

你会发现许多'保存到pdf'的产品。在客户端,他们将倾向于抓取一组像素并将光栅图形转储到文件中,其中包含最薄的PDF定义的贴面。如果它们基于服务器,那么它们可以非常强大 - 有很多像Aspose这样的工具,ABCPDF真正提供了一些PDF争论服务器端 - 但这不是你在OP中寻找的。

You will find many 'save to pdf' products. When client-side they will be leaning toward grabbing a set of pixels and dumping a raster graphic into a file with the thinnest veneer of 'PDF' definition wrapped around it. Where they are server based then they can be quite powerful - there are plenty of tools like Aspose, and ABCPDF that truly offer some PDF wrangling server side - but this is not what you are looking for in your OP.

摘要 - 非常复杂的主题。如果有任何东西出现作为一种潜力,它可能会对所涵盖的PDF功能产生许多限制,从而限制它可以安全编辑的内容。

Summary - very complicated subject. If anything ever emerges as a potential it will likely have many constraints in terms of the PDF features covered and thus restrictions on what it can safely edit.

如果您正在寻找在线编辑最终导出为PDF的文档,然后前进的方法是保留文档源的html版本并让用户使用TinyMCE,CKEditor等编辑它,然后使用其中一个服务器端工具来获取保存源HTML并呈现为PDF。像ABCPDF这样的工具渲染HTML,忠实地让你添加图像,页眉和页脚,页码等。

If you are looking for online editing of documents that are ultimately exported as PDF, then a way forward is to keep an html version of the document source and have the user edit this with TinyMCE, CKEditor, etc, then use one of the server-side tools to take the saved source HTML and render out to PDF. Tools like ABCPDF render HTML faithfully let you add images, headers and footers, page numbers, etc.

这是你(假设)需要的实用答案,尽管它仍然是在字体(许可)问题,基于浏览器的编辑器的笨拙,一些HTML编辑组件所规定的HTML的全面怪异等方面有一些权衡取舍。但它是可行的。

This is a pragmatic answer to your (assumed) need, though it still has some trade-offs in terms of the font (licencing) issues, clunkiness of browser-based editors, all-round weirdness of the HTML laid down by some HTML editing components, etc. But it IS viable.

最后的想法 - 重新思考你所需要的范围。如果HTML编辑和转换为服务器上的PDF可用于您,它是一个很好的路径,您将找到客户端和服务器的免费和商业组件来支持它。

Final thoughts - rethink the scope of what you need. If HTML editing and convert to PDF at server is usable for you it is a well-trodden path and you will find both free and commercial components for client and server to support it.

编辑:如果您需要注释PDF,那么事情就容易多了。在服务器上,您需要生成文档页面的图像,将它们发送到客户端,将它们显示给用户,让用户标记它们,将注释的坐标捕获回服务器并使用服务器端PDF库,用于将注释呈现到PDF中。这是可以实现的,但需要服务器端PDF到图像处理和客户端演示和注释捕获的各种技能组合。

If you need to annotate the PDF then things are much easier. On the server, you need to generate images of the pages of the document, send those to the client, display them to the user, let the user mark them up, capture the co-ordinates of the annotations back to the server and use a server-side PDF library to render the annotations into the PDF. It is achievable, though requires various skillsets for server-side PDF to image manipulation and client side presentation and annotation capture.

这篇关于在浏览器中编辑*现有* PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆