在iText中的JavaScript操作中搜索PDF以查找特定字符串 [英] Searching PDF for a specific string in JavaScript action in iText

查看:321
本文介绍了在iText中的JavaScript操作中搜索PDF以查找特定字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是在PDF中的注释中查找给定模式的JavaScript。为此,我提供了以下代码:

My aim is to look for JavaScript of a given pattern in annotations in PDF. To do so I have come with the following code:

public static void main(String[] args) {

        try {

            // Reads and parses a PDF document
            PdfReader reader = new PdfReader("Test.pdf");

            // For each PDF page
            for (int i = 1; i <= reader.getNumberOfPages(); i++) {

                // Get a page a PDF page
                PdfDictionary page = reader.getPageN(i);
                // Get all the annotations of page i
                PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS);

                // If page does not have annotations
                if (page.getAsArray(PdfName.ANNOTS) == null) {
                    continue;
                }

                // For each annotation
                for (int j = 0; j < annotsArray.size(); ++j) {

                    // For current annotation
                    PdfDictionary curAnnot = annotsArray.getAsDict(j);

                    // check if has JS as described below
                 PdfDictionary AnnotationAction = AnnotationDictionary.GetAsDict(PdfName.A);
                 // test if it is a JavaScript action
                 if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){
                 // what here?
                 }


                }
            }

        } catch (Exception e) {
            e.printStackTrace();
        }

    }

据我所知,比较字符串由 StringCompare完成库。问题是它比较了两个字符串,但我很想知道注释中的JavaScript操作是否以(或包含)此字符串开头: if(this.hostContainer){try {

As far as I know comparing strings is done by StringCompare library. The thing is it compares two strings, but I am interested to know if JavaScript action in annotations starts with (or contains) this string: if (this.hostContainer) { try {

那么,如何检查注释中的JavaScript是否包含上述字符串?

So, how do I check if JavaScript in annotations contains the above-mentioned string?

编辑
JS的示例页面位于: pdf with JS

推荐答案

在ISO 32000-1中,JavaScript操作的定义如下:

JavaScript actions are defined as follows in ISO 32000-1:


12.6.4.16 JavaScript操作



在调用JavaScript操作时,符合标准的处理器应执行写入的脚本JavaScript编程语言。根据脚本的性质,文档中的各种交互式表单域可以更新其值或更改其视觉外观。 Mozilla开发中心的客户端JavaScript参考和用于Acrobat API的Adobe JavaScript参考(参见参考书目)详细介绍了JavaScript脚本的内容和效果。表217显示了特定于此类操作的操作字典条目。

12.6.4.16 JavaScript Actions

Upon invocation of a JavaScript action, a conforming processor shall execute a script that is written in the JavaScript programming language. Depending on the nature of the script, various interactive form fields in the document may update their values or change their visual appearances. Mozilla Development Center’s Client-Side JavaScript Reference and the Adobe JavaScript for Acrobat API Reference (see the Bibliography) give details on the contents and effects of JavaScript scripts. Table 217 shows the action dictionary entries specific to this type of action.

表217 - 特定于JavaScript操作的其他条目


键入

S
name
(必填)此词典描述的操作类型;应该是用于JavaScript操作的JavaScript。

S name (Required) The type of action that this dictionary describes; shall be JavaScript for a JavaScript action.

JS
文本字符串或
文本流
(必需)包含要执行的JavaScript脚本的文本字符串或文本流。
PDFDocEncoding或Unicode编码(后者由Unicode前缀U + FEFF标识)将用于编码字符串或流的内容。

JS text string or text stream (Required) A text string or text stream containing the JavaScript script to be executed. PDFDocEncoding or Unicode encoding (the latter identified by the Unicode prefix U+FEFF) shall be used to encode the contents of the string or stream.

支持在 JavaScript 脚本中使用参数化函数调用,PDF文档名称字典中的JavaScript条目(参见7.7.4,名称字典)可能包含将名称字符串映射到文档级别的名称树JavaScript动作。打开文档时,应执行此名称树中的所有操作,定义JavaScript函数以供文档中的其他脚本使用。

To support the use of parameterized function calls in JavaScript scripts, the JavaScript entry in a PDF document’s name dictionary (see 7.7.4, "Name Dictionary") may contain a name tree that maps name strings to document-level JavaScript actions. When the document is opened, all of the actions in this name tree shall be executed, defining JavaScript functions for use by other scripts in the document.

因此,如果您有兴趣知道注释中的JavaScript操作是否以(或包含)此字符串开头: if(this.hostContainer){try { 的情况

Thus, if you are interested to know if JavaScript action in annotations starts with (or contains) this string: if (this.hostContainer) { try { in the situation

 if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){
 // what here?
 }

您可能需要先检查 AnnotationAction。获取(PdfName.JS) PdfString PdfStream ,在任何一种情况下都检索将内容作为字符串,并检查它或它调用的任何函数(该函数可能在JavaScript名称树中定义)包含使用常用字符串比较方法搜索的字符串。

you likely will want to first check whether AnnotationAction.Get(PdfName.JS) is a PdfString or a PdfStream, in either case retrieve the content as string, and check whether it or any of the functions it calls (the function might be defined in the JavaScript name tree) contains the string you search using usual string comparison methods.

我拿了你的代码,清理了一下(特别是它是C#和Java的混合)并添加了如上所述的代码检查注释操作元素中的立即JavaScript代码:

I took your code, cleaned it a bit (in particular it was a mix of C# and Java) and added code as described above inspecting the immediate JavaScript code in the annotation action element:

System.out.println("file.pdf - Looking for special JavaScript actions.");
// Reads and parses a PDF document
PdfReader reader = new PdfReader(resource);

// For each PDF page
for (int i = 1; i <= reader.getNumberOfPages(); i++)
{
    System.out.printf("\nPage %d\n", i);
    // Get a page a PDF page
    PdfDictionary page = reader.getPageN(i);
    // Get all the annotations of page i
    PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS);

    // If page does not have annotations
    if (annotsArray == null)
    {
        System.out.printf("No annotations.\n", i);
        continue;
    }

    // For each annotation
    for (int j = 0; j < annotsArray.size(); ++j)
    {
        System.out.printf("Annotation %d - ", j);

        // For current annotation
        PdfDictionary curAnnot = annotsArray.getAsDict(j);

        // check if has JS as described below
        PdfDictionary annotationAction = curAnnot.getAsDict(PdfName.A);
        if (annotationAction == null)
        {
            System.out.print("no action");
        }
        // test if it is a JavaScript action
        else if (PdfName.JAVASCRIPT.equals(annotationAction.get(PdfName.S)))
        {
            PdfObject scriptObject = annotationAction.getDirectObject(PdfName.JS);
            if (scriptObject == null)
            {
                System.out.print("missing JS entry");
                continue;
            }
            final String script;
            if (scriptObject.isString())
                script = ((PdfString)scriptObject).toUnicodeString();
            else if (scriptObject.isStream())
            {
                try (   ByteArrayOutputStream baos = new ByteArrayOutputStream()    )
                {
                    ((PdfStream)scriptObject).writeContent(baos);
                    script = baos.toString("ISO-8859-1");
                }
            }
            else
            {
                System.out.println("malformed JS entry");
                continue;
            }

            if (script.contains("if (this.hostContainer) { try {"))
                System.out.print("contains test string - ");

            System.out.printf("\n---\n%s\n---", script);
            // what here?
        }
        else
        {
            System.out.print("no JavaScript action");
        }
        System.out.println();
    }
}

(测试 SearchActionJavaScript ,方法 testSearchJsActionInFile

(Test SearchActionJavaScript, method testSearchJsActionInFile)

using (PdfReader reader = new PdfReader(sourcePath))
{
    Console.WriteLine("file.pdf - Looking for special JavaScript actions.");

    // For each PDF page
    for (int i = 1; i <= reader.NumberOfPages; i++)
    {
        Console.Write("\nPage {0}\n", i);
        // Get a page a PDF page
        PdfDictionary page = reader.GetPageN(i);
        // Get all the annotations of page i
        PdfArray annotsArray = page.GetAsArray(PdfName.ANNOTS);

        // If page does not have annotations
        if (annotsArray == null)
        {
            Console.WriteLine("No annotations.");
            continue;
        }

        // For each annotation
        for (int j = 0; j < annotsArray.Size; ++j)
        {
            Console.Write("Annotation {0} - ", j);

            // For current annotation
            PdfDictionary curAnnot = annotsArray.GetAsDict(j);

            // check if has JS as described below
            PdfDictionary annotationAction = curAnnot.GetAsDict(PdfName.A);
            if (annotationAction == null)
            {
                Console.Write("no action");
            }
            // test if it is a JavaScript action
            else if (PdfName.JAVASCRIPT.Equals(annotationAction.Get(PdfName.S)))
            {
                PdfObject scriptObject = annotationAction.GetDirectObject(PdfName.JS);
                if (scriptObject == null)
                {
                    Console.WriteLine("missing JS entry");
                    continue;
                }
                String script;
                if (scriptObject.IsString())
                    script = ((PdfString)scriptObject).ToUnicodeString();
                else if (scriptObject.IsStream())
                {
                    using (MemoryStream stream = new MemoryStream())
                    {
                        ((PdfStream)scriptObject).WriteContent(stream);
                        script = stream.ToString();
                    }
                }
                else
                {
                    Console.WriteLine("malformed JS entry");
                    continue;
                }

                if (script.Contains("if (this.hostContainer) { try {"))
                    Console.Write("contains test string - ");

                Console.Write("\n---\n{0}\n---", script);
                // what here?
            }
            else
            {
                Console.Write("no JavaScript action");
            }
            Console.WriteLine();
        }
    }
}



输出



当针对您的示例文件运行任一版本时,会得到:

Output

When running either version against your sample file, one gets:

file.pdf - Looking for special JavaScript actions.

Page 1
Annotation 0 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_vii', 0]);
} catch(e) { console.println(e); }};
---
Annotation 1 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_ix', 0]);
} catch(e) { console.println(e); }};
---
Annotation 2 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_xi', 0]);
} catch(e) { console.println(e); }};
---
Annotation 3 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_3', 0]);
} catch(e) { console.println(e); }};
---
Annotation 4 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_15', 0]);
} catch(e) { console.println(e); }};
---
Annotation 5 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_37', 0]);
} catch(e) { console.println(e); }};
---
Annotation 6 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_57', 0]);
} catch(e) { console.println(e); }};
---
Annotation 7 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_81', 0]);
} catch(e) { console.println(e); }};
---
Annotation 8 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_111', 0]);
} catch(e) { console.println(e); }};
---
Annotation 9 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_136', 0]);
} catch(e) { console.println(e); }};
---
Annotation 10 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_160', 0]);
} catch(e) { console.println(e); }};
---
Annotation 11 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_197', 0]);
} catch(e) { console.println(e); }};
---
Annotation 12 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_179', 0]);
} catch(e) { console.println(e); }};
---
Annotation 13 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_201', 0]);
} catch(e) { console.println(e); }};
---
Annotation 14 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_223', 0]);
} catch(e) { console.println(e); }};
---

Page 2
No annotations.

Page 3
No annotations.

这篇关于在iText中的JavaScript操作中搜索PDF以查找特定字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆