使用pdfbox替换pdf中的文本时出现错误字符 [英] Bad characters when replacing text in pdf using pdfbox

查看：793 发布时间：2020/5/25 5:15:55 java pdf pdfbox

本文介绍了使用pdfbox替换pdf中的文本时出现错误字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试替换pdf中的文本，这有点替换，这是我的代码

I'm trying to replace text in pdf and it's kind of replaced, this is my code

PDDocument doc = null;
    int occurrences = 0;
    try {
        doc = PDDocument.load("test.pdf"); //Input PDF File Name
        List pages = doc.getDocumentCatalog().getAllPages();
        for (int i = 0; i < pages.size(); i++) {
            PDPage page = (PDPage) pages.get(i);
            PDStream contents = page.getContents();
            PDFStreamParser parser = new PDFStreamParser(contents.getStream());
            parser.parse();
            List tokens = parser.getTokens();
            for (int j = 0; j < tokens.size(); j++) {
                Object next = tokens.get(j);
                if (next instanceof PDFOperator) {
                    PDFOperator op = (PDFOperator) next;
                    // Tj and TJ are the two operators that display strings in a PDF
                    if (op.getOperation().equals("Tj")) {
                        // Tj takes one operator and that is the string
                        // to display so lets update that operator
                        COSString previous = (COSString) tokens.get(j - 1);
                        String string = previous.getString();
                        if (string.contains("Good")) {
                            string = string.replace("Good", "Bad");
                            occurrences++;
                        }
                        //Word you want to change. Currently this code changes word "Good" to "Bad"
                        previous.reset();
                        previous.append(string.getBytes("ISO-8859-1"));
                    } else if (op.getOperation().equals("TJ")) {
                        COSArray previous = (COSArray) tokens.get(j - 1);
                        COSString temp = new COSString();

                        String tempString = "";
                        for (int t = 0; t < previous.size(); t++) {

                            if (previous.get(t) instanceof COSString) {
                                tempString += ((COSString) previous.get(t)).getString();

                            }
                        }

                        temp.append(tempString.getBytes("ISO-8859-1"));
                        tempString = "";
                        tempString = temp.getString();
                        if (tempString.contains("Good")) {
                            tempString = tempString.replace("Good", "Bad");
                            occurrences++;
                        }
                        previous.clear();

                        String[] stringArray = tempString.split(" ");

                        for (String string : stringArray) {
                            COSString cosString = new COSString();
                            string = string + " ";
                            cosString.append(string.getBytes("ISO-8859-1"));
                            previous.add(cosString);
                        }

                    }
                }
            }
            // now that the tokens are updated we will replace the page content stream.
            PDStream updatedStream = new PDStream(doc);
            OutputStream out = updatedStream.createOutputStream();
            ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
            tokenWriter.writeTokens(tokens);
            page.setContents(updatedStream);
        }
        System.out.println("number of matches found: " + occurrences);
        doc.save("a.pdf"); //Output file name
    } catch (IOException ex) {
        Logger.getLogger(ReplaceTextInPDF.class.getName()).log(Level.SEVERE, null, ex);
    } catch (COSVisitorException ex) {
        Logger.getLogger(ReplaceTextInPDF.class.getName()).log(Level.SEVERE, null, ex);
    } finally {
        if (doc != null) {
            try {
                doc.close();
            } catch (IOException ex) {
                Logger.getLogger(ReplaceTextInPDF.class.getName()).log(Level.SEVERE, null, ex);
            }
        }
    }

将其替换为不良字符或隐藏形状的问题(例如，不良词仅变为d字符)，但是如果我将其复制并粘贴到其他位置，则会正确粘贴预期的词，另外，当我在生成的pdf中搜索新单词时，找不到它，但是当我搜索旧单词时，它在替换的地方找到了它

the issue that it's replaced in a bad characters or hidden shape ( as example the bad word becomes only d character), but if i copy and paste it in another place it paste the expected word correctly, also when i search the generated pdf for the new word it doesn't find it, but when i search with the old word it finds it in the replaced places

使用pdfbox替换pdf中的文本时出现错误字符 [英] Bad characters when replacing text in pdf using pdfbox

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

使用pdfbox替换pdf中的文本时出现错误字符 [英] Bad characters when replacing text in pdf using pdfbox

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭