使用java编程将pdf可编辑字段转换为文本 [英] convert pdf editable fields into text using java programming

查看:190
本文介绍了使用java编程将pdf可编辑字段转换为文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我准备了一个可编辑的表单但无法使用java编程将pdf可编辑字段转换为文本。

I have prepared one editable form But unable to convert pdf editable fields into text using java programming.

使用过的API - pdfbox-app-2.0.0-RC2, PDFBox-0.7.3,itextpdf-5.1.0,pdfclown。

Used API – pdfbox-app-2.0.0-RC2, PDFBox-0.7.3, itextpdf-5.1.0, pdfclown.

请帮助我了解如何将pdf可编辑字段转换为java中的文本。

Pleas help me to find out how to convert pdf editable fields into text in java.

使用过的java程序(能够将普通的pdf转换为文本但不能将pdf可编辑字段转换为文本)。

used java program (able to convert normal pdf into text but not converting pdf editable fields into text ).

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import java.awt.Desktop;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.swing.JFileChooser;

public class PdfConvertor_1{
 public static void main(String[] args){
  selectPDFFiles();
 }


 //allow pdf files selection for converting
 public static void selectPDFFiles(){

  JFileChooser chooser = new JFileChooser();
      FileNameExtensionFilter filter = new FileNameExtensionFilter("PDF","pdf");
      chooser.setFileFilter(filter);
      chooser.setMultiSelectionEnabled(true);
      int returnVal = chooser.showOpenDialog(null);
      if(returnVal == JFileChooser.APPROVE_OPTION) {
               File[] Files=chooser.getSelectedFiles();
               System.out.println("Please wait...");
               for( int i=0;i<Files.length;i++){     
                convertPDFToText(Files[i].toString(),"textfrompdf"+i+".txt");
                }
   System.out.println("Conversion complete");
                }

  }

 public static void convertPDFToText(String src,String desc){
  try{
   //create file writer
   FileWriter fw=new FileWriter("D:\\POC_Pdf2.txt");
   //create buffered writer
   BufferedWriter bw=new BufferedWriter(fw);
   //create pdf reader
   PdfReader pr=new PdfReader(src);
   //get the number of pages in the document
   int pNum=pr.getNumberOfPages();
   //extract text from each page and write it to the output text file
   for(int page=1;page<=pNum;page++){
    String text=PdfTextExtractor.getTextFromPage(pr, page);
    bw.write(text);
    bw.newLine();

   }
   bw.flush();
   bw.close();



  }catch(Exception e){e.printStackTrace();}

 }

}

请使用java检查我想要转换为文本的图像中的可编辑字段

推荐答案

字段不属于页面内容流,因此从页面获取文本不会为您提供字段的值。

Fields are not part of the page content stream, hence "getting text from a page" won't give you the value of a field.

您需要从中获取表单PDF。表单是从PDF的根词典中引用的,但是有一种方便的方法来获取 AcroFields 对象。使用iTextSharp / C#的人已经回答了这个问题:如何阅读PDF表格数据使用iTextSharp?

You need to get the form from the PDF. A form is referred to from the root dictionary of a PDF, but there's a convenience method to get an AcroFields object. This question was already answered for people who are using iTextSharp / C#: How to read PDF form data using iTextSharp?

PdfReader reader = new PdfReader(path_to_your_completed_form);
AcroFields fields = reader.getAcroFields();
String value = fields.getField(key);

在此代码段中, path_to_your_completed_form 已满您从 JFileChooser 获得的路径是表单中定义的字段之一的值。

In this snippet, path_to_your_completed_form is the full path you get from your JFileChooser and key is the value of one of the fields that is defined in your form.

如果您不知道表单中定义了哪些字段,请阅读问题的答案如何从AcroFields获取特定类型?像PushButtonField,RadioCheckField等?该示例中有一些代码允许您遍历可用字段,并通知您字段是文本字段,复选框,单选按钮等。

If you don't know which fields are defined in your form, please read the answer to the question How to get specific types from AcroFields? Like PushButtonField, RadioCheckField, etc? There's some code in that example that allows you to loop over the available fields and that informs you if a field is a text field, a check box, a radio button, and so on.

这篇关于使用java编程将pdf可编辑字段转换为文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆