将 XWPFRun 拆分为多个运行 [英] Split a XWPFRun into multiple runs

查看:83
本文介绍了将 XWPFRun 拆分为多个运行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试修改现有 Word 文档,方法是自动将其中的某些关键字加粗.举个例子:

I am trying to modify existing Word documents by automatically bolding some keywords in them. As an example:

敏捷的棕色狐狸跳过了懒惰的狗.(1)

The quick brown fox jumps over the lazy dog. (1)

会变成:

敏捷的棕色狐狸跳过懒惰的.(2)

The quick brown fox jumps over the lazy dog. (2)

我的问题是 (1) 是一次运行,而 (2) 变成了 5 次运行(5 是因为狗之后的句号不是粗体,但它是一个细节).我得到了多次运行.完全没问题.

My issue is that (1) is one run and that (2) becomes 5 runs (5 as the period after dog is not bold, but it's a detail). I get the multiple run. It's totally ok.

问题 1:

有没有办法在同一段落中轻松地将一个运行拆分为多个运行?我没能做到.

Is there a way to easily split a run into multiple runs within the same paragraph? I have not managed to do it.

问题 2:

由于我没有设法拆分运行,因此我尝试创建一个新段落,但它确实不理想并将运行添加到其中.我设法完全复制了一个段落并修改了复制段落中的运行,我保留了样式(这是预期的),但我在复制的段落中丢失了注释.

As I did not manage to split a run, I tried to create a new paragraph, but it's really not ideal and add the runs to it. I have managed to duplicate a paragraph entirely and modify the runs in the duplicated paragraph, I keep styling (which is expected) but I lose comments in the duplicated paragraph.

理想情况下,我想将运行拆分到位(在段落内),但如果不可能有一个更好的克隆器:

Ideally, I'd like to split the run in place (within the paragraph), but if it's not possible have a better cloner that this:

  public static void cloneRun(XWPFRun source, XWPFRun clone) {
    CTRPr rPr = clone.getCTR().isSetRPr()
        ? clone.getCTR().getRPr()
        : clone.getCTR().addNewRPr();
    rPr.set(source.getCTR().getRPr());
    clone.setText(source.getText(0));
  }

推荐答案

如何使用 apache poi 更改特定 word 文档的颜色? 我已经展示了一种用于拆分 XWPFRuns 的算法格式化原因.这仅用于格式化一个字符,它不会克隆运行属性.但基本显示​​.我们必须查看整个段落,因为只有插入运行的方法.并且我们需要按字符循环遍历运行文本,因为所有拆分成单词的方法都会导致标点符号出现问题,然后将单词重新组合成一个段落.

In How do I change color of a particular word document using apache poi? I have shown an algorithm to split XWPFRuns for formatting reasons. This is only for formatting one character and it does not clone the run properties. But the basic is shown. We have to look at the entire paragraph since only there are methods for inserting runs. And we need looping over the run texts character wise since all methods for split into words will lead to problems with punctuation marks while reassembling the words to a paragraph then.

缺少的是将运行属性从原始运行克隆到新添加的运行属性的方法.这可以通过克隆底层 w:rPr 元素来完成.

What lacks is a method for cloning the run properties from original run to the new added ones. This could be done by cloning the underlying w:rPr element.

然后整个方法是遍历段落中的所有运行.如果我们有一个包含关键字的运行,则将运行文本拆分为字符.然后遍历该运行中的所有字符并缓冲它们.如果缓冲的字符流以关键字结尾,则将当前缓冲的所有字符(关键字除外)设置为实际运行的文本.然后为格式化的关键字插入新的运行并从原始运行克隆运行属性.将关键字设置到运行中并进行额外的格式化.然后为下一个字符插入一个新的运行,并从原始运行中克隆运行属性.对于段落中的每次运行,依此类推.

Then the whole approach is to go through all runs in paragraph. If we have a run with keyword in it, then split run text into characters. Then go through all characters in that run and buffer them. If the buffered character stream ends with the keyword, then set all chars, which are current buffered, except the keyword, as the text of the actual run. Then insert new run for the formatted keyword and clone the run properties from original run. Set the keyword into the run and do the additional formatting. Then insert a new run for the next characters and also clone the run properties from original run. So on for each run in the paragraph.

完整示例:

import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;

import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;

import java.util.*;
import java.awt.Desktop;

public class WordFormatWords {

 static void cloneRunProperties(XWPFRun source, XWPFRun dest) { // clones the underlying w:rPr element
  CTR tRSource = source.getCTR();
  CTRPr rPrSource = tRSource.getRPr();
  if (rPrSource != null) {
   CTRPr rPrDest = (CTRPr)rPrSource.copy();
   CTR tRDest = dest.getCTR();
   tRDest.setRPr(rPrDest);
  }
 }

 static void formatWord(XWPFParagraph paragraph, String keyword, Map<String, String> formats) {
  int runNumber = 0;
  while (runNumber < paragraph.getRuns().size()) { //go through all runs, we cannot use for each since we will possibly insert new runs
   XWPFRun run = paragraph.getRuns().get(runNumber);
   XWPFRun run2 = run;
   String runText = run.getText(0);
   if (runText != null && runText.contains(keyword)) { //if we have a run with keyword in it, then

    // This code part is to manage comment ranges.
    // Do we have commentRangeEnd immediately after the run?
    // If so then remember that in a cursor.
    XmlCursor commentRangeEndCursor = null; 
    XmlCursor cursor = run.getCTR().newCursor();
    cursor.toEndToken();
    if (cursor.hasNextToken()) {
     cursor.toNextToken();
     XmlObject commentRangeEnd = cursor.getObject();
     if (commentRangeEnd != null && commentRangeEnd instanceof CTMarkupRange) {
      commentRangeEndCursor = cursor;
     }
    }

    char[] runChars = runText.toCharArray(); //split run text into characters
    StringBuffer sb = new StringBuffer();
    for (int charNumber = 0; charNumber < runChars.length; charNumber++) { //go through all characters in that run
     sb.append(runChars[charNumber]); //buffer all characters
     runText = sb.toString();
     if (runText.endsWith(keyword)) { //if the bufferend character stream ends with the keyword  
      //set all chars, which are current buffered, except the keyword, as the text of the actual run
      run.setText(runText.substring(0, runText.length() - keyword.length()), 0); 
      run2 = paragraph.insertNewRun(++runNumber); //insert new run for the formatted keyword
      cloneRunProperties(run, run2); // clone the run properties from original run
      run2.setText(keyword, 0); // set the keyword in run
      for (String toSet : formats.keySet()) { // do the additional formatting
       if ("color".equals(toSet)) {
        run2.setColor(formats.get(toSet));
       } else if ("bold".equals(toSet)) {
        run2.setBold(Boolean.valueOf(formats.get(toSet)));
       }
      }
      run2 = paragraph.insertNewRun(++runNumber); //insert a new run for the next characters
      cloneRunProperties(run, run2); // clone the run properties from original run
      run = run2;
      sb = new StringBuffer(); //empty the buffer
     } 
    }
    run.setText(sb.toString(), 0); //set all characters, which are currently buffered, as the text of the actual run

    // This code part is to manage comment ranges.
    // If we had remembered commentRangeEnd, then move this to here now.
    if(commentRangeEndCursor != null) {
     cursor = run.getCTR().newCursor();
     cursor.toEndToken();
     if (cursor.hasNextToken()) {
      cursor.toNextToken();
      commentRangeEndCursor.moveXml(cursor);
     }
     cursor.dispose();
     commentRangeEndCursor.dispose();
    }

   }
   runNumber++;
  }
 }


 public static void main(String[] args) throws Exception {

  XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));

  String[] keywords = new String[]{"fox", "dog"};
  Map<String, String> formats = new HashMap<String, String>();
  formats.put("bold", "true");
  formats.put("color", "DC143C");

  for (XWPFParagraph paragraph : doc.getParagraphs()) { //go through all paragraphs
   for (String keyword : keywords) {
    formatWord(paragraph, keyword, formats);
   }
  }

  FileOutputStream out = new FileOutputStream("result.docx");
  doc.write(out);
  out.close();
  doc.close();

  System.out.println("Done");
  Desktop.getDesktop().open(new File("result.docx"));

 }
}

此代码还处理 XML 标记范围元素,例如 commentRangeEnd,它们紧跟在运行的 r 元素之后.此类标记范围元素用于标记其他元素组的开始和结束.例如,应用注释的一组文本运行元素位于具有相同 idcommentRangeStartcommentRangeEnd 之间.

This code also takes care about XML markup range elements such as commentRangeEnd which are immediately after the run's r element. Such markup range elements are used to mark start and end of groups of other elements. For example a group of text run elements to those a comment is applied is between commentRangeStart and commentRangeEnd having same id.

如果在需要拆分的运行之后紧跟一个 commentRangeEnd,那么我们会在游标中记住它.然后在拆分运行后,我们将这个 commentRangeEnd 立即移动到最后一个新插入的运行之后.所以评论应该保持正确.

If immediately after the run which needs to be split follows a commentRangeEnd, then we remember that in a cursor. Then after splitting the run we move this commentRangeEnd immediately behind the last new inserted run. So comments should stay correct.

当然,即使这样也会有一些缺点,因为有时 Microsoft Word 在文本运行中存储文本的方式很笨拙.当 Microsoft Word 是源时,没有唯一的通用解决方案.

Of course even this will have some disadvantages because of the clumsy kind on how Microsoft Word stores text in text runs sometimes. There is not the one and only general solution for this when Microsoft Word is the source.

这篇关于将 XWPFRun 拆分为多个运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆