itext如何检查pdf页面上是否存在巨型字符串 [英] itext how to check if giant string is present on the pdf page

查看：157 发布时间：2018/11/16 17:41:54 java pdf itext itextpdf

本文介绍了itext如何检查pdf页面上是否存在巨型字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

- 我正在使用IText插件在我的java项目上创建/读取pdf。
-我正在阅读来自任何扩展程序（pdf，doc，word等）的多个文本文件，并将其内容写在新的pdf上（所有文件的所有内容连接在一起）
- 分隔每个内容对于巨型pdf上的每个文件，我总是开始一个新页面，在新页面的开头用红色写出文件的确切路径，然后写入文件的内容

-I am using the IText plugin to create/read pdfs on my java project. -I am reading multiple text files from any extension(pdf,doc,word etc) and writing their content on a new pdf(all the content of all the files joint together) -To separate each content of each file on the giant pdf, i am always starting a new page, writing the exact path to the file in red at the start of the new page and then writing the content of the file

问题：

我想写这个文件在这个pdf上有多少个页面

如何检查pdf页面上是否存在字符串？我有所有文件路径，所以我想检查是否有任何路径写在页面上

我按照本教程提取我的任何页面的字符串：< a href =http://www.quicklyjava.com/read-pdf-file-in-java-using-itext/ =nofollow> http://www.quicklyjava.com/read-pdf-file -in-java-using-itext /

但是，当我提取所有页面并检查是否有一个文件路径出现在页面上时（执行string.contains（...）），系统在pdf页面上找不到我的文件路径！我已经检查了为什么会发生这种情况，当我输出一个页面的字符串时，就像这样：

I want to write how many pages did the file have on this pdf
How do i check if a string is present on the pdf page? I have all the files paths, so i would like to check if any of the paths is written on the page
I was following this tutorial to extract the string of any of my pages: http://www.quicklyjava.com/read-pdf-file-in-java-using-itext/
But when i extract all the page and check if one if my file paths is present at the page(doing a string.contains(...)), the system doesn't find my file path on the pdf page! I have checked why this happens and when i outputted one page's string, it was like this:

1。
PdfGeneratorForSoftwareRegistration / PdfGeneratorForSoftwareRegistration /
src / br / ufrn / pairg / pdfgenerator / LeitorArquivoTexto.java
package br.ufrn.pairg.pdfgenerator;

1. PdfGeneratorForSoftwareRegistration/PdfGeneratorForSoftwareRegistration/ src/br/ufrn/pairg/pdfgenerator/LeitorArquivoTexto.java package br.ufrn.pairg.pdfgenerator;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.Scanner;

import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.IOException; import java.util.Scanner;

public ...

public...

当我检查文件路径PdfGeneratorForSoftwareRegistration / PdfGeneratorForSoftwareRegistration /
src / br / ufrn / pairg / pdfgenerator / LeitorArquivoTexto.java是否出现在这个巨大的字符串时，系统没有'找到它。你能看到问题吗？我的路径很大，占据了2条线！这就是问题！

When i checked to see if the file path "PdfGeneratorForSoftwareRegistration/PdfGeneratorForSoftwareRegistration/ src/br/ufrn/pairg/pdfgenerator/LeitorArquivoTexto.java" was present at this giant string, the system didn't find it. Can you see the problem? My path is so big that occupies 2 lines! That's the problem!

所以，我的问题是：有没有办法检查pdf文本中是否存在使用itext插件的巨型字符串？

So, my question is: is there a way to check if a giant string is present on a pdf text using itext plugin?

推荐答案

PDF文件中的页面使用页面树进行组织。页面树的每个叶子是具有键和值的页面字典。你可以像这样在页面字典中添加一个自定义条目：

Pages in a PDF file are organized using a page tree. Each leaf of the page tree is a page dictionary with keys and values. You could add a custom entry to the page dictionary like this:

public void createPdf(String dest) throws IOException, DocumentException {
    Document document = new Document();
    PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(dest));
    document.open();
    document.add(new Paragraph("Page 1"));
    document.newPage();
    document.add(new Paragraph("Page 2"));
    document.newPage();
    document.add(new Paragraph("Page 3"));
    document.newPage();
    document.add(new Paragraph("Page 4"));
    writer.addPageDictEntry(new PdfName("ITXT_PageMarker"), new PdfString("Marker for page 4"));
    document.newPage();
    document.add(new Paragraph("Page 5"));
    document.newPage();
    document.add(new Paragraph("Page 6"));
    writer.addPageDictEntry(new PdfName("ITXT_PageMarker"), new PdfName("PageMarker"));
    document.newPage();
    document.add(new Paragraph("Page 7"));
    writer.addPageDictEntry(new PdfName("ITXT_PageMarker"), new PdfNumber(7));
    document.newPage();
    document.add(new Paragraph("Page 8"));
    document.close();
}

如果你查看PDF，这看起来像这样：

If you look inside the PDF, this looks like this:

为了这个例子，我添加了一个PDF字符串对于第4页，第6页的PDF名称和第7页的PDF编号。

For the sake of this example, I added a PDF string for page 4, a PDF name for page 6 and a PDF number for page 7.

您可以检查是否存在此自定义键：

You can check for the presence of this custom key like this:

public void check(String filename) throws IOException {
    PdfReader reader = new PdfReader(filename);
    PdfDictionary pagedict;
    for (int i = 1; i < reader.getNumberOfPages(); i++) {
        pagedict = reader.getPageN(i);
        System.out.println(pagedict.get(new PdfName("ITXT_PageMarker")));
    }
    reader.close();
}

此 check（）是这样的：


null
null
null
Marker for page 4
null
/PageMarker
7

 重要：除了ISO 32000中定义的那些，您不能只为创建 PDF语法的新密钥。但是，如果您使用ISO注册4位数代码，则可以创建自己的自定义密钥。 。例如：Adobe注册ADBE，iText注册ITXT，...如果您引入新的自定义键，则应使用ISO注册的代码作为前缀。例如：在iText，我们可以使用 ITXT_PageMarker ，或 ITXT_custom ，或 ITXT_Whatever ，...这个规则避免了两个不同的公司引入了具有不同含义的相同代码。
Important: You can't just invent new keys for the PDF syntax apart from those defined in ISO 32000. However, you can create your own custom keys if you register a 4 digit code with ISO. For instance: Adobe registered ADBE, iText registered ITXT,... If you introduce new custom keys, you should use the code registered with ISO as a prefix. For instance: at iText, we can use ITXT_PageMarker, or ITXT_custom, or ITXT_Whatever,... This rule avoids that two different company introduce the same code with a different meaning.

                        这篇关于itext如何检查pdf页面上是否存在巨型字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

itext如何检查pdf页面上是否存在巨型字符串 [英] itext how to check if giant string is present on the pdf page

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

itext如何检查pdf页面上是否存在巨型字符串 [英] itext how to check if giant string is present on the pdf page

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭