替换 .docx 中的文本模板(Apache POI、Docx4j 或其他) [英] Replace text templates inside .docx (Apache POI, Docx4j or other)

查看:34
本文介绍了替换 .docx 中的文本模板(Apache POI、Docx4j 或其他)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用正则表达式 (java RegEx) 在 MS Word (.docx) 文档中进行替换:

示例:..., с одной стороны, и %SOME_TEXT% именуемое в дальнейшем «Заказчик», влице %SOME_TEXT% действующего на основании %SOME_TEXT% с другой стороны,заключили настоящий Договор о нижеследующем:……

我尝试获取文本模板(如 %SOME_TEXT%)使用 Apache POI - XWPF 并替换文本,但不能保证替换,因为 POI 将 运行 => 我得到这样的东西(System.out.println(run.getText(0))):

<代码>…, с одной стороны, и%SOME_TEXT%именуемоев дальнейшем «Заказчик», в лице%一些_文本%

代码示例:

FileInputStream fis = new FileInputStream(new File("document.docx"));XWPFDocument 文档 = 新 XWPFDocument(fis);列表段落 = document.getParagraphs();段落.forEach(para -> {para.getRuns().forEach(run -> {String text = run.getText(0);如果(文本!= null){System.out.println(text);//文本替换过程//run.setText(newText,0);}});});

我发现了许多类似的问题(例如

灰色字段是 Word 中的旧形式 Textfields,命名为 Text1Text2Text3.Textfields 块看起来像:

然后是下面的代码:

import java.io.FileOutputStream;导入 java.io.FileInputStream;导入 org.apache.poi.xwpf.usermodel.*;导入 org.apache.xmlbeans.XmlObject;导入 org.apache.xmlbeans.XmlCursor;导入 org.apache.xmlbeans.SimpleValue;导入 javax.xml.namespace.QName;公共类 WordReplaceTextInFormFields {私有静态无效replaceFormFieldText(XWPFDocument文档,字符串ffname,字符串文本){boolean foundformfield = false;for (XWPFParagraph 段落: document.getParagraphs()) {for (XWPFRun 运行:paragraph.getRuns()) {XmlCursor 游标 = run.getCTR().newCursor();cursor.selectPath("声明命名空间 w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:fldChar/@w:fldCharType");while(cursor.hasNextSelection()) {cursor.toNextSelection();XmlObject obj = cursor.getObject();if ("begin".equals(((SimpleValue)obj).getStringValue())) {cursor.toParent();obj = cursor.getObject();obj = obj.selectPath("声明命名空间 w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/@w:val")[0];如果 (ffname.equals(((SimpleValue)obj).getStringValue())) {foundformfield = true;} 别的 {foundformfield = false;}} else if ("end".equals(((SimpleValue)obj).getStringValue())) {如果(foundformfield)返回;foundformfield = false;}}if (foundformfield && run.getCTR().getTList().size() > 0) {run.getCTR().getTList().get(0).setStringValue(text);//System.out.println(run.getCTR());}}}}public static void main(String[] args) 抛出异常 {XWPFDocument 文档 = new XWPFDocument(new FileInputStream("WordTemplate.docx"));replaceFormFieldText(document, "Text1", "Моя Компания");replaceFormFieldText(document, "Text2", "Аксель Джоачимович Рихтер");replaceFormFieldText(document, "Text3", "Доверенность");FileOutputStream out = new FileOutputStream("WordReplaceTextInFormFields.docx");document.write(out);关闭();文档.close();}}

此代码需要

I want to do replacements in MS Word (.docx) document using regular expression (java RegEx):

Example: 
 …, с одной стороны, и %SOME_TEXT% именуемое в дальнейшем «Заказчик», в 
 лице  %SOME_TEXT%   действующего на основании %SOME_TEXT% с другой стороны, 
 заключили настоящий Договор о нижеследующем: …

I tried to get text templates (like %SOME_TEXT%) use Apache POI - XWPF and replace text, but replacement is not guaranteed, because POI separates runs => I get something like this(System.out.println(run.getText(0))):

…
, с одной стороны, и 
%
SOME_TEXT
%

именуемое 
в дальнейшем «Заказчик», в лице

%
SOME
_
TEXT
%

code example:

FileInputStream fis = new FileInputStream(new File("document.docx"));
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
paragraphs.forEach(para -> {
    para.getRuns().forEach(run -> {
        String text = run.getText(0);
        if (text != null) {
           System.out.println(text);
           // text replacement process
           // run.setText(newText,0);
        }
    });
});

I have found many similar questions (like this "Replacing a text in Apache POI XWPF "), but did not found answer to my problem (answer here "Seperated text line in Apache POI XWPFRun object" offer inconvenient solution).

I tried to use docx4j and this example => "docx4j find and replace", but docx4j works similar.

For docx4j, see stackoverflow.com/questions/17093781/… – JasonPlutext

I tried to use docx4j => documentPart.variableReplace(mappings);, but replacement not guaranteed(plutext/docx4j).

Did you use VariablePrepare? stackoverflow.com/a/17143488/1031689 – JasonPlutext

Yes, no results:

WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new File("test.docx"));
HashMap<String, String> mappings = new HashMap<>();
VariablePrepare.prepare(wordMLPackage);//see notes
mappings.put("SOME_TEXT", "XXXX");
wordMLPackage.getMainDocumentPart().variableReplace(mappings);
wordMLPackage.save(new File("out.docx"));

Input\output text:

Input:
…, с одной стороны, и ${SOME_TEXT} именуемое в дальнейшем «Заказчик» ...
Output:
…, с одной стороны, и SOME_TEXT именуемое в дальнейшем «Заказчик» ...

To see your runs after VariablePrepare, turn on INFO level logging for VariablePrepare, or just System.out.println(wordMLPackage.getMainDocumentPart().getXML())

I understand that templates were separated to different Runs, but main question of the topic, how not to separate template to different Runs. I use System.out.println(wordMLPackage.getMainDocumentPart().getXML()) and saw:

<w:r>
   <w:t xml:space="preserve">, с одной стороны, и </w:t>
</w:r>
<w:r><w:t>$</w:t></w:r>
<w:r><w:t>{</w:t></w:r>
<w:r>
    <w:rPr>
       <w:rFonts w:eastAsia="Times-Roman"/>
          <w:color w:val="000000" w:themeColor="text1"/>
          <w:lang w:val="en-US"/>
    </w:rPr>
    <w:t>SOME</w:t>        <!-- First part of template: "SOME" -->
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:eastAsia="Times-Roman"/>
        <w:color w:val="000000" w:themeColor="text1"/>
    </w:rPr>
    <w:t>_</w:t>           <!-- Second part of template: "_"   -->
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:eastAsia="Times-Roman"/>
        <w:color w:val="000000" w:themeColor="text1"/>
        <w:lang w:val="en-US"/>
    </w:rPr>
    <w:t>TEXT</w:t>        <!-- Third part of template: "TEXT" -->
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:eastAsia="Times-Roman"/>
        <w:color w:val="000000" w:themeColor="text1"/>
    </w:rPr>
    <w:t>}</w:t>
</w:r>

, that template located in different xml tags and I do not understand WHY...

Please help me to find convenient approach to replace text.....

解决方案

As you see, the approach "to do replacements in MS Word (.docx) document using regular expression (java RegEx)" is not really good since you never can be sure that the text to replace will be together in one text-run. Better approach is using fields (merge fields or form fields) or content controls in Word.

My favourites for such requirements are still the good old form fields in Word.

First advantage is that even without document protection it will not be possible formatting parts of form field content different and so tearing apart the form field content into different runs (but see note 1). Second advantage is that because of the gray background the form fields are good visible in document content. And another advantage is the possibility applying a document protection so that only filling the form fields will be possibly, even in Word' s GUI. This is really good for preserving such contractual documents from unwanted changings.

(Note 1): At least Word prevents formatting parts of form field content different and so tearing apart the form field content into different runs. Other word-processing software (Writer for example) may not respecting this restriction though.

So I would have the Word template like so:

The grey fields are the good old form Textfields in Word, named Text1, Text2 and Text3. Textfields blocks look like:

<xml-fragment w:rsidR="00833656" 
  ...
 xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" 
 ... >
  <w:rPr>
    <w:rFonts w:eastAsia="Times-Roman"/>
    <w:color w:themeColor="text1" w:val="000000"/>
    <w:lang w:val="en-US"/>
  </w:rPr>
    <w:fldChar w:fldCharType="begin">
      <w:ffData>
        <w:name w:val="Text1"/>
        <w:enabled w:val="0"/>
        <w:calcOnExit w:val="0"/>
        <w:textInput>
          <w:default w:val="<введите заказчика>"/>
        </w:textInput>
      </w:ffData>
    </w:fldChar>
  </xml-fragment>
</xml-fragment>

Then the following code:

import java.io.FileOutputStream;
import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;

import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import org.apache.xmlbeans.SimpleValue;
import javax.xml.namespace.QName;

public class WordReplaceTextInFormFields {

 private static void replaceFormFieldText(XWPFDocument document, String ffname, String text) {
  boolean foundformfield = false;
  for (XWPFParagraph paragraph : document.getParagraphs()) {
   for (XWPFRun run : paragraph.getRuns()) {
    XmlCursor cursor = run.getCTR().newCursor();
    cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:fldChar/@w:fldCharType");
    while(cursor.hasNextSelection()) {
     cursor.toNextSelection();
     XmlObject obj = cursor.getObject();
     if ("begin".equals(((SimpleValue)obj).getStringValue())) {
      cursor.toParent();
      obj = cursor.getObject();
      obj = obj.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/@w:val")[0];
      if (ffname.equals(((SimpleValue)obj).getStringValue())) {
       foundformfield = true;
      } else {
       foundformfield = false;
      }
     } else if ("end".equals(((SimpleValue)obj).getStringValue())) {
      if (foundformfield) return;
      foundformfield = false;
     }
    }
    if (foundformfield && run.getCTR().getTList().size() > 0) {
     run.getCTR().getTList().get(0).setStringValue(text);
//System.out.println(run.getCTR());
    }
   }
  }
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("WordTemplate.docx"));

  replaceFormFieldText(document, "Text1", "Моя Компания");
  replaceFormFieldText(document, "Text2", "Аксель Джоачимович Рихтер");
  replaceFormFieldText(document, "Text3", "Доверенность");

  FileOutputStream out = new FileOutputStream("WordReplaceTextInFormFields.docx");
  document.write(out);
  out.close();
  document.close();
 }
}

This code needs the full jar of all of the schemas ooxml-schemas-1.3.jar as mentioned in FAQ-N10025.

Produces:

这篇关于替换 .docx 中的文本模板(Apache POI、Docx4j 或其他)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆