从PDF文件中删除所有文本 [英] Remove all text from PDF file

查看：271 发布时间：2020/11/14 18:43:34 pdf-generation ghostscript

本文介绍了从PDF文件中删除所有文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Ghostscript将源PDF文件转换为PNG图像数组.在将PDF页面转换为PNG图像之前，我需要从PDF中提取(删除)所有文本，以便转换后的页面图像将包含除文本之外的所有其他元素.

I am using Ghostscript to convert source PDF file into array of PNG images. Before I convert PDF page into PNG image I would need to extract (delete) all text from PDF so that converted page image would contain all other elements, excluding text.

我可以使用Ghostscript实现此功能，还是需要研究其他工具?

Can I achieve this with Ghostscript or will I need to look into different tools?

我还对一种可以读取并保存我的源PDF并删除所有文本的工具感兴趣.

I would also be interested in a tool that can read-save my source PDF removing all the text.

推荐答案

自从我上次回答以来，开发一直在继续，并且现在有一个新选项可供使用，以证明有一个新答案.

Since my previous answer, development has continued, and a new option is available now, which justifies a new answer.

最新版本的Ghostscript支持3个新参数，使您可以从PDF中删除所有TEXT或所有IMAGE或所有VECTOR元素.

The most recent versions of Ghostscript support 3 new parameters, which allow you to remove either all TEXT, or all IMAGE or all VECTOR elements from a PDF.

要从输入的PDF中删除所有TEXT元素，请运行

To remove all TEXT elements from an input PDF, run

gs -o no-more-texts.pdf -sDEVICE=pdfwrite -dFILTERTEXT   input.pdf

要从输入的PDF中删除所有光栅图像元素，请运行

To remove all raster IMAGE elements from an input PDF, run

gs -o no-more-texts.pdf -sDEVICE=pdfwrite -dFILTERIMAGE  input.pdf

要从输入的PDF中删除所有VECTOR元素，请运行

To remove all VECTOR elements from an input PDF, run

gs -o no-more-texts.pdf -sDEVICE=pdfwrite -dFILTERVECTOR input.pdf

当然，您也可以组合以上两个参数中的任何一个(将所有三个参数组合在一起将创建空白页.

Of course, you can also combine any of above two parameters (combining all three will create empty pages.

这是PDF页面的屏幕截图，其中原始页面包含所有三个元素，而结果页面看上去不同.

Here are screenshots of a PDF page, where the original contained all three elements whereas the resulting pages look different.

_{原始PDF页面的屏幕截图，其中包含图像"，矢量"和文本"元素.}

_{Screenshot of original PDF page containing "image", "vector" and "text" elements.}

运行以下6条命令将创建剩余内容的所有6种可能的变体:

Running the following 6 commands will create all 6 possible variations of remaining contents:


 gs -o noIMG.pdf   -sDEVICE=pdfwrite -dFILTERIMAGE                input.pdf
 gs -o noTXT.pdf   -sDEVICE=pdfwrite -dFILTERTEXT                 input.pdf
 gs -o noVCT.pdf   -sDEVICE=pdfwrite -dFILTERVECTOR               input.pdf

 gs -o onlyIMG.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERTEXT  input.pdf
 gs -o onlyTXT.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERIMAGE input.pdf
 gs -o onlyVCT.pdf -sDEVICE=pdfwrite -dFILTERIMAGE  -dFILTERTEXT  input.pdf

下图说明了结果:

_{顶行，从左起:删除了所有文本"；删除所有图像"；删除所有向量". 底部一行:从左开始:仅保留文本"；仅保留图像"；仅保留向量".}

_{Top row, from left: all "text" removed; all "images" removed; all "vectors" removed. Bottom row, from left: only "text" kept; only "images" kept; only "vectors" kept.}

这篇关于从PDF文件中删除所有文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从PDF文件中删除所有文本 [英] Remove all text from PDF file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从PDF文件中删除所有文本 [英] Remove all text from PDF file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭