使用pdfbox替换pdf中的文本时出现错误字符 [英] Bad characters when replacing text in pdf using pdfbox

查看:793
本文介绍了使用pdfbox替换pdf中的文本时出现错误字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试替换pdf中的文本,这有点替换,这是我的代码

I'm trying to replace text in pdf and it's kind of replaced, this is my code

PDDocument doc = null;
    int occurrences = 0;
    try {
        doc = PDDocument.load("test.pdf"); //Input PDF File Name
        List pages = doc.getDocumentCatalog().getAllPages();
        for (int i = 0; i < pages.size(); i++) {
            PDPage page = (PDPage) pages.get(i);
            PDStream contents = page.getContents();
            PDFStreamParser parser = new PDFStreamParser(contents.getStream());
            parser.parse();
            List tokens = parser.getTokens();
            for (int j = 0; j < tokens.size(); j++) {
                Object next = tokens.get(j);
                if (next instanceof PDFOperator) {
                    PDFOperator op = (PDFOperator) next;
                    // Tj and TJ are the two operators that display strings in a PDF
                    if (op.getOperation().equals("Tj")) {
                        // Tj takes one operator and that is the string
                        // to display so lets update that operator
                        COSString previous = (COSString) tokens.get(j - 1);
                        String string = previous.getString();
                        if (string.contains("Good")) {
                            string = string.replace("Good", "Bad");
                            occurrences++;
                        }
                        //Word you want to change. Currently this code changes word "Good" to "Bad"
                        previous.reset();
                        previous.append(string.getBytes("ISO-8859-1"));
                    } else if (op.getOperation().equals("TJ")) {
                        COSArray previous = (COSArray) tokens.get(j - 1);
                        COSString temp = new COSString();

                        String tempString = "";
                        for (int t = 0; t < previous.size(); t++) {

                            if (previous.get(t) instanceof COSString) {
                                tempString += ((COSString) previous.get(t)).getString();

                            }
                        }

                        temp.append(tempString.getBytes("ISO-8859-1"));
                        tempString = "";
                        tempString = temp.getString();
                        if (tempString.contains("Good")) {
                            tempString = tempString.replace("Good", "Bad");
                            occurrences++;
                        }
                        previous.clear();

                        String[] stringArray = tempString.split(" ");

                        for (String string : stringArray) {
                            COSString cosString = new COSString();
                            string = string + " ";
                            cosString.append(string.getBytes("ISO-8859-1"));
                            previous.add(cosString);
                        }

                    }
                }
            }
            // now that the tokens are updated we will replace the page content stream.
            PDStream updatedStream = new PDStream(doc);
            OutputStream out = updatedStream.createOutputStream();
            ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
            tokenWriter.writeTokens(tokens);
            page.setContents(updatedStream);
        }
        System.out.println("number of matches found: " + occurrences);
        doc.save("a.pdf"); //Output file name
    } catch (IOException ex) {
        Logger.getLogger(ReplaceTextInPDF.class.getName()).log(Level.SEVERE, null, ex);
    } catch (COSVisitorException ex) {
        Logger.getLogger(ReplaceTextInPDF.class.getName()).log(Level.SEVERE, null, ex);
    } finally {
        if (doc != null) {
            try {
                doc.close();
            } catch (IOException ex) {
                Logger.getLogger(ReplaceTextInPDF.class.getName()).log(Level.SEVERE, null, ex);
            }
        }
    }

将其替换为不良字符或隐藏形状的问题(例如,不良词仅变为d字符),但是如果我将其复制并粘贴到其他位置,则会正确粘贴预期的词, 另外,当我在生成的pdf中搜索新单词时,找不到它,但是当我搜索旧单词时,它在替换的地方找到了它

the issue that it's replaced in a bad characters or hidden shape ( as example the bad word becomes only d character), but if i copy and paste it in another place it paste the expected word correctly, also when i search the generated pdf for the new word it doesn't find it, but when i search with the old word it finds it in the replaced places

推荐答案

我发现aspose了,该链接显示了如何使用它替换pdf中的文本,它很容易并且很完美,除了它不是免费的,因此免费版本是在pdf文件页面的顶部打印版权行 http://www.aspose .com/docs/display/pdfjava/Replace + Text + in + Pages of of + a + PDF + Document

I found aspose, this link shows how to use it to replace text in pdfs, it's easy and works perfect except that it's not free, so the free version is printing copyrights line on the head of pdf file pages http://www.aspose.com/docs/display/pdfjava/Replace+Text+in+Pages+of+a+PDF+Document

这篇关于使用pdfbox替换pdf中的文本时出现错误字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆