Apache POI:${my_placeholder} 被视为三个不同的运行 [英] Apache POI: ${my_placeholder} is treated as three different runs

查看:28
本文介绍了Apache POI:${my_placeholder} 被视为三个不同的运行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 .docx 模板,需要填充占位符,例如 ${programming_language}${education}

占位符关键字必须很容易与其他普通词区分开来,因此它们用${ }括起来.

for (XWPFTable 表:doc.getTables()) {for (XWPFTableRow 行:table.getRows()) {for (XWPFTableCell 单元格:row.getTableCells()) {for (XWPFParagraph 段落: cell.getParagraphs()) {for (XWPFRun 运行:paragraph.getRuns()) {System.out.println("运行文本:" + run.text());/** 替换此处的文本等 */}}}}}

我想将占位符与封闭的 ${ } 字符一起提取.问题是,这似乎是封闭的字符被视为不同的运行...

运行文本:${运行文本:programming_language运行文本:}运行文本:这里有一些纯文本运行文本:${运行文本:教育运行文本:}

相反,我想达到以下效果:

运行文本:${programming_language}运行文本:这里有一些纯文本运行文本:${education}

我尝试过使用其他封闭字符,例如:{ }<>##

我不想对 runs 等进行一些奇怪的串联.我想将它放在一个 XWPFRun 中.

如果我找不到合适的解决方案,我会这样做:VAR_PROGRAMMING_LANGUGEVAR_EDUCATION,我想.

解决方案

当前 apache poi 4.1.2 提供 TextSegment 来处理那些 Word 文本运行问题.XWPFParagraph.searchText 在段落中搜索字符串并返回 TextSegment.这提供了对该段落中该文本的开始运行和结束运行的访问(BeginRunEndRun).它还提供对开始运行中的开始字符位置和结束运行中的结束字符位置的访问(BeginCharEndChar).它还提供了对文本运行中文本元素索引的访问(BeginTextEndText).这始终应该是 0,因为默认文本运行只有一个文本元素.

有了这个,我们可以做到以下几点:

用替换替换开始运行中找到的部分字符串.为此,获取搜索字符串之前的文本部分并将替换内容连接到它.之后,开始运行完全包含替换.

删除开始运行和结束运行之间的所有文本运行,因为它们包含不需要的搜索字符串部分.

最后运行时只保留搜索字符串后的文本部分.

这样做我们可以替换多个文本运行中的文本.

以下示例显示了这一点.

import java.io.*;导入 org.apache.poi.xwpf.usermodel.*;导入 org.openxmlformats.schemas.wordprocessingml.x2006.main.*;公共类 WordReplaceTextSegment {静态公共无效replaceTextSegment(XWPFParagraph段落,字符串textToFind,字符串替换){TextSegment foundTextSegment = null;PositionInParagraph startPos = new PositionInParagraph(0, 0, 0);while((foundTextSegment = Paragraph.searchText(textToFind, startPos)) != null) {//搜索所有要查找的文本段System.out.println(foundTextSegment.getBeginRun()+":"+foundTextSegment.getBeginText()+":"+foundTextSegment.getBeginChar());System.out.println(foundTextSegment.getEndRun()+":"+foundTextSegment.getEndText()+":"+foundTextSegment.getEndChar());//可能在开始运行的 textToFind 之前有文本XWPFRun beginRun =paragraph.getRuns().get(foundTextSegment.getBeginRun());String textInBeginRun = beginRun.getText(foundTextSegment.getBeginText());String textBefore = textInBeginRun.substring(0, foundTextSegment.getBeginChar());//我们只需要之前的文本//可能在最终运行的 textToFind 之后有文本XWPFRun endRun =paragraph.getRuns().get(foundTextSegment.getEndRun());String textInEndRun = endRun.getText(foundTextSegment.getEndText());String textAfter = textInEndRun.substring(foundTextSegment.getEndChar() + 1);//我们只需要后面的文本if (foundTextSegment.getEndRun() == foundTextSegment.getBeginRun()) {textInBeginRun = textBefore + 替换 + textAfter;//如果我们只有一次运行,我们需要之前的文本,然后是替换,然后是运行之后的文本} 别的 {textInBeginRun = textBefore + 替换;//否则我们需要之前的文本,然后是开始运行中的替换endRun.setText(textAfter, foundTextSegment.getEndText());//以及结束运行后的文本}beginRun.setText(textInBeginRun, foundTextSegment.getBeginText());//开始运行和结束运行之间的运行需要删除for (int runBetween = foundTextSegment.getEndRun() - 1; runBetween > foundTextSegment.getBeginRun(); runBetween--) {段落.removeRun(runBetween);//删除不需要的运行}}}public static void main(String[] args) 抛出异常 {XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));String textToFind = "${这是要查找的文本}";//可能在不同的运行中字符串替换 = "替换文本";for (XWPFParagraph段落: doc.getParagraphs()) {//遍历所有段落if (paragraph.getText().contains(textToFind)) {//段落包含要查找的文本replaceTextSegment(段落,textToFind,替换);}}FileOutputStream out = new FileOutputStream(result.docx");doc.write(out);关闭();doc.close();}}

以上代码并非在所有情况下都有效,因为 XWPFParagraph.searchText 存在错误.所以我会提供一个更好的searchText方法:

/*** 此方法解析段落并搜​​索搜索的字符串.* 如果找到字符串,则返回true和字符串的位置* 将保存在参数 startPos 中.** @param 已搜索* @param startPos*/静态 TextSegment searchText(XWPFParagraph 段落,搜索字符串,PositionInParagraph startPos){int startRun = startPos.getRun(),startText = startPos.getText(),startChar = startPos.getChar();int beginRunPos = 0, candCharPos = 0;布尔新列表 = 假;//点击率[] rArray =paragraph.getRArray();//这不包含所有运行.它缺少 ex 的超链接运行.java.util.List运行 = 段落.getRuns();int beginTextPos = 0, beginCharPos = 0;//必须在for循环之外//for (int runPos = startRun; runPos < rArray.length; runPos++) {for (int runPos = startRun; runPos <runs.size(); runPos++) {//int beginTextPos = 0, beginCharPos = 0, textPos = 0, charPos;//int beginTextPos = 0, beginCharPos = 0 必须在for循环之外int textPos = 0, charPos;//点击率 ctRun = rArray[runPos];点击率 ctRun = running.get(runPos).getCTR();XmlCursor c = ctRun.newCursor();c.selectPath("./*");尝试 {而 (c.toNextSelection()) {XmlObject o = c.getObject();if (o instanceof CTText) {if (textPos >= startText) {字符串候选 = ((CTText) o).getStringValue();if (runPos == startRun) {charPos = startChar;} 别的 {字符 = 0;}for (; charPos <Candidate.length(); charPos++) {如果 ((candidate.charAt(charPos) == searched.charAt(0)) && (candCharPos == 0)) {beginTextPos = textPos;beginCharPos = charPos;beginRunPos = runPos;新列表 = 真;}if (candidate.charAt(charPos) == searched.charAt(candCharPos)) {if (candCharPos + 1 < searched.length()) {candCharPos++;} else if (newList) {TextSegment 段 = new TextSegment();segment.setBeginRun(beginRunPos);segment.setBeginText(beginTextPos);segment.setBeginChar(beginCharPos);segment.setEndRun(runPos);segment.setEndText(textPos);segment.setEndChar(charPos);返回段;}} 别的 {candCharPos = 0;}}}textPos++;} else if (o instanceof CTProofErr) {c.removeXml();} else if (o instanceof CTRPr) {//没做什么} 别的 {candCharPos = 0;}}} 最后 {c.处置();}}返回空;}

这将被称为:

<预><代码>...while((foundTextSegment = searchText(paragraph, textToFind, startPos)) != null) {...

I have a .docx template with placeholders to be filled, such as ${programming_language}, ${education}, etc.

The placeholder keywords must be easily distinguished from the other plain words, hence they are enclosed with ${ }.

for (XWPFTable table : doc.getTables()) {
  for (XWPFTableRow row : table.getRows()) {
    for (XWPFTableCell cell : row.getTableCells()) {
      for (XWPFParagraph paragraph : cell.getParagraphs()) {
        for (XWPFRun run : paragraph.getRuns()) {
          System.out.println("run text: " + run.text());
          /** replace text here, etc. */
        }
      }
    }
  }
}

I want to extract the placeholders together with the enclosing ${ } characters. The problem is, that is seems like the enclosing characters are treated as different runs...

run text: ${
run text: programming_language
run text: }
run text: Some plain text here 
run text: ${
run text: education
run text: }

Instead, I would like to achieve the following effect:

run text: ${programming_language}
run text: Some plain text here
run text: ${education}

I have tried using other enclosing characters, such as: { }, < >, # #, etc.

I do not want to do some weird concatenations of runs, etc. I want to have it in a single XWPFRun.

If I cannot find the proper solution, I will just make it like so: VAR_PROGRAMMING_LANGUGE, VAR_EDUCATION, I think.

解决方案

Current apache poi 4.1.2 provides TextSegment to deal with those Word text-run issues. XWPFParagraph.searchText searches for a string in a paragraph and returns a TextSegment. This provides access to the begin run and the end run of that text in that paragraph (BeginRun and EndRun). It also provides access to the start character position in begin run and end character position in end run (BeginChar and EndChar). It additionally provides access to the index of the text element in the text run (BeginText and EndText). This always should be 0, because default text runs only have one text element.

Having this, we can do the following:

Replace the found partial string in begin run by the replacement. To do so, get the text part which was before the searched string and concatenate the replacement to it. After that the begin run fully contains the replacement.

Delete all text runs between begin run and end run as they contain parts of the searched string which is not more needed.

Let remain only the text part after the searched string in end run.

Doing so we are able replacing text which is in multiple text runs.

Following example shows this.

import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;

public class WordReplaceTextSegment {

 static public void replaceTextSegment(XWPFParagraph paragraph, String textToFind, String replacement) {
  TextSegment foundTextSegment = null;
  PositionInParagraph startPos = new PositionInParagraph(0, 0, 0);
  while((foundTextSegment = paragraph.searchText(textToFind, startPos)) != null) { // search all text segments having text to find

System.out.println(foundTextSegment.getBeginRun()+":"+foundTextSegment.getBeginText()+":"+foundTextSegment.getBeginChar());
System.out.println(foundTextSegment.getEndRun()+":"+foundTextSegment.getEndText()+":"+foundTextSegment.getEndChar());

   // maybe there is text before textToFind in begin run
   XWPFRun beginRun = paragraph.getRuns().get(foundTextSegment.getBeginRun());
   String textInBeginRun = beginRun.getText(foundTextSegment.getBeginText());
   String textBefore = textInBeginRun.substring(0, foundTextSegment.getBeginChar()); // we only need the text before

   // maybe there is text after textToFind in end run
   XWPFRun endRun = paragraph.getRuns().get(foundTextSegment.getEndRun());
   String textInEndRun = endRun.getText(foundTextSegment.getEndText());
   String textAfter = textInEndRun.substring(foundTextSegment.getEndChar() + 1); // we only need the text after

   if (foundTextSegment.getEndRun() == foundTextSegment.getBeginRun()) { 
    textInBeginRun = textBefore + replacement + textAfter; // if we have only one run, we need the text before, then the replacement, then the text after in that run
   } else {
    textInBeginRun = textBefore + replacement; // else we need the text before followed by the replacement in begin run
    endRun.setText(textAfter, foundTextSegment.getEndText()); // and the text after in end run
   }

   beginRun.setText(textInBeginRun, foundTextSegment.getBeginText());

   // runs between begin run and end run needs to be removed
   for (int runBetween = foundTextSegment.getEndRun() - 1; runBetween > foundTextSegment.getBeginRun(); runBetween--) {
    paragraph.removeRun(runBetween); // remove not needed runs
   }

  }
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));

  String textToFind = "${This is the text to find}"; // might be in different runs
  String replacement = "Replacement text";

  for (XWPFParagraph paragraph : doc.getParagraphs()) { //go through all paragraphs
   if (paragraph.getText().contains(textToFind)) { // paragraph contains text to find
    replaceTextSegment(paragraph, textToFind, replacement);
   }
  }

  FileOutputStream out = new FileOutputStream("result.docx");
  doc.write(out);
  out.close();
  doc.close();

 }
}

Above code works not in all cases because XWPFParagraph.searchText has bugs. So I will provide a better searchText method:

/**
 * this methods parse the paragraph and search for the string searched.
 * If it finds the string, it will return true and the position of the String
 * will be saved in the parameter startPos.
 *
 * @param searched
 * @param startPos
 */
static TextSegment searchText(XWPFParagraph paragraph, String searched, PositionInParagraph startPos) {
    int startRun = startPos.getRun(),
        startText = startPos.getText(),
        startChar = startPos.getChar();
    int beginRunPos = 0, candCharPos = 0;
    boolean newList = false;

    //CTR[] rArray = paragraph.getRArray(); //This does not contain all runs. It lacks hyperlink runs for ex.
    java.util.List<XWPFRun> runs = paragraph.getRuns(); 
    
    int beginTextPos = 0, beginCharPos = 0; //must be outside the for loop
    
    //for (int runPos = startRun; runPos < rArray.length; runPos++) {
    for (int runPos = startRun; runPos < runs.size(); runPos++) {
        //int beginTextPos = 0, beginCharPos = 0, textPos = 0, charPos; //int beginTextPos = 0, beginCharPos = 0 must be outside the for loop
        int textPos = 0, charPos;
        //CTR ctRun = rArray[runPos];
        CTR ctRun = runs.get(runPos).getCTR();
        XmlCursor c = ctRun.newCursor();
        c.selectPath("./*");
        try {
            while (c.toNextSelection()) {
                XmlObject o = c.getObject();
                if (o instanceof CTText) {
                    if (textPos >= startText) {
                        String candidate = ((CTText) o).getStringValue();
                        if (runPos == startRun) {
                            charPos = startChar;
                        } else {
                            charPos = 0;
                        }

                        for (; charPos < candidate.length(); charPos++) {
                            if ((candidate.charAt(charPos) == searched.charAt(0)) && (candCharPos == 0)) {
                                beginTextPos = textPos;
                                beginCharPos = charPos;
                                beginRunPos = runPos;
                                newList = true;
                            }
                            if (candidate.charAt(charPos) == searched.charAt(candCharPos)) {
                                if (candCharPos + 1 < searched.length()) {
                                    candCharPos++;
                                } else if (newList) {
                                    TextSegment segment = new TextSegment();
                                    segment.setBeginRun(beginRunPos);
                                    segment.setBeginText(beginTextPos);
                                    segment.setBeginChar(beginCharPos);
                                    segment.setEndRun(runPos);
                                    segment.setEndText(textPos);
                                    segment.setEndChar(charPos);
                                    return segment;
                                }
                            } else {
                                candCharPos = 0;
                            }
                        }
                    }
                    textPos++;
                } else if (o instanceof CTProofErr) {
                    c.removeXml();
                } else if (o instanceof CTRPr) {
                    //do nothing
                } else {
                    candCharPos = 0;
                }
            }
        } finally {
            c.dispose();
        }
    }
    return null;
}

This will be called like:

...
while((foundTextSegment = searchText(paragraph, textToFind, startPos)) != null) {
...

这篇关于Apache POI:${my_placeholder} 被视为三个不同的运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆