如何在VC ++中打开PDF以提取文本 [英] How to open PDF in VC++ to extract text
问题描述
我需要使用用户提供的密码来转换加密的pdf->文本.
在C ++中
我可以通过使用zlib的VC ++打开(未加密的)pdf.
我正在``rb''模式下打开pdf文件
我是否需要编写自定义函数以使用给定的密码解密流?
[状态:适用于未加密PDF的代码]
在JAVA中
我在Java中找到了使用PDFBox-0.7.3进行PDF到文本提取的代码,但是代码存在一些问题.
PDFBox-0.7.3\src\org\pdfbox\ExtractText.java
处存在空指针异常
Hi,
I need to convert encrypted pdf->text with the password given by the user.
IN C++
I am able to open(unencrypted) pdf by using VC++ using zlib.
I am opening pdf in ''rb'' mode
Do I need to write a custom function to decrypt stream by the given password ?
[STATUS: code working for unencrypted PDF ]
IN JAVA
I have found code in java which do PDF to Text extraction by using PDFBox-0.7.3, but have some issues with the code.
There is a null pointer exception at PDFBox-0.7.3\src\org\pdfbox\ExtractText.java
AccessPermission ap = document.getCurrentAccessPermission();
if (!ap.canExtractContent())
http://www.apache.org/dist/pdfbox/1.5. 0/pdfbox-1.5.0-src.zip [ ^ ]
[状态:代码只会创建一个空文件]
我是否以错误的方式提取了pdf?
解密PDF的正确步骤是什么?
http://www.apache.org/dist/pdfbox/1.5.0/pdfbox-1.5.0-src.zip[^]
[STATUS: code just creates empty file ]
Am I doing extraction of the pdf in wrong way ?
What can be the correct steps to decrypt PDF ?
推荐答案
您要么:
(快速路线)
- 使用图书馆(免费或商业).正如已经建议的那样, Google 会帮助您找到它
You either:
(the fast route)
- Use a library (free or commercial). As already suggested, Google would help you finding it
- 研究
PDF
规范(免费提供)并编写自己的代码来完成这项工作.
- Study the
PDF
specifications (freely available) and write your own code to do the job.
听起来,您需要一个具有API的SDK,该API允许您打开和提取PDF文件中的文本.我相信您可以通过搜索"PDF SDK"找到很多.
It sounds like what you need is an SDK that has APIs to allow you to open and extract text from a PDF file. I''m sure you will find lots by googling ''PDF SDK''.
这篇关于如何在VC ++中打开PDF以提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!