替换.docx内的文本模板(Apache POI,Docx4j或其他) [英] Replace text templates inside .docx (Apache POI, Docx4j or other)

查看:157
本文介绍了替换.docx内的文本模板(Apache POI,Docx4j或其他)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用正则表达式(java RegEx)在 MS Word ( .docx )文档中进行替换:

Example: 
 …, с одной стороны, и %SOME_TEXT% именуемое в дальнейшем «Заказчик», в 
 лице  %SOME_TEXT%   действующего на основании %SOME_TEXT% с другой стороны, 
 заключили настоящий Договор о нижеследующем: …

我尝试使用 Apache POI-XWPF 来获取文本模板(例如%SOME_TEXT%)并替换文本,但不能保证替换,因为POI分隔了运行 =>我得到这样的内容(System.out.println(run.getText(0))):

…
, с одной стороны, и 
%
SOME_TEXT
%

именуемое 
в дальнейшем «Заказчик», в лице

%
SOME
_
TEXT
%

代码示例:

FileInputStream fis = new FileInputStream(new File("document.docx"));
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
paragraphs.forEach(para -> {
    para.getRuns().forEach(run -> {
        String text = run.getText(0);
        if (text != null) {
           System.out.println(text);
           // text replacement process
           // run.setText(newText,0);
        }
    });
});

我发现了许多类似的问题(例如" 替换Apache POI XWPF中的文本 "),但找不到我的问题的答案 (在此处回答" Apache POI XWPFRun对象中的分隔文本行 "提供了不方便的解决方案)./p>

我尝试使用 docx4j ,此示例=>" docx4j查找并替换 ",但 docx4j 的工作原理与此类似.

对于docx4j,请参见 stackoverflow.com/questions/17093781/… – JasonPlutext

我尝试使用 docx4j => documentPart.variableReplace(mappings);,但不能保证替换( stackoverflow.com/a/17143488/1031689 – JasonPlutext

是,没有结果:

WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new File("test.docx"));
HashMap<String, String> mappings = new HashMap<>();
VariablePrepare.prepare(wordMLPackage);//see notes
mappings.put("SOME_TEXT", "XXXX");
wordMLPackage.getMainDocumentPart().variableReplace(mappings);
wordMLPackage.save(new File("out.docx"));

输入\输出文本:

Input:
…, с одной стороны, и ${SOME_TEXT} именуемое в дальнейшем «Заказчик» ...
Output:
…, с одной стороны, и SOME_TEXT именуемое в дальнейшем «Заказчик» ...

要在VariablePrepare之后查看运行情况,请打开INFO级别的VariablePrepare日志记录,或者仅System.out.println(wordMLPackage.getMainDocumentPart().getXML())

我了解模板已分离到不同的 Runs ,但是该主题的主要问题是如何不将模板分离到不同的 Runs .我使用System.out.println(wordMLPackage.getMainDocumentPart().getXML())并看到:

<w:r>
   <w:t xml:space="preserve">, с одной стороны, и </w:t>
</w:r>
<w:r><w:t>$</w:t></w:r>
<w:r><w:t>{</w:t></w:r>
<w:r>
    <w:rPr>
       <w:rFonts w:eastAsia="Times-Roman"/>
          <w:color w:val="000000" w:themeColor="text1"/>
          <w:lang w:val="en-US"/>
    </w:rPr>
    <w:t>SOME</w:t>        <!-- First part of template: "SOME" -->
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:eastAsia="Times-Roman"/>
        <w:color w:val="000000" w:themeColor="text1"/>
    </w:rPr>
    <w:t>_</w:t>           <!-- Second part of template: "_"   -->
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:eastAsia="Times-Roman"/>
        <w:color w:val="000000" w:themeColor="text1"/>
        <w:lang w:val="en-US"/>
    </w:rPr>
    <w:t>TEXT</w:t>        <!-- Third part of template: "TEXT" -->
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:eastAsia="Times-Roman"/>
        <w:color w:val="000000" w:themeColor="text1"/>
    </w:rPr>
    <w:t>}</w:t>
</w:r>

,该模板位于不同的xml标记中,我不明白为什么...

请帮助我找到方便的替换文本的方法.....

解决方案

如您所见,使用正则表达式(java RegEx)在MS Word(.docx)文档中进行替换"的方法并不是很好,因为您从未可以确保将要替换的文本合并在一个文本中.更好的方法是在Word中使用字段(合并字段或表单字段)或内容控件.

我最喜欢这些要求的仍然是Word中的旧表单字段.

第一个优点是,即使没有文档保护,也无法格式化表单字段内容的不同部分,因此将表单字段内容拆分为不同的行(但请参见注释1).第二个优点是,由于背景为灰色,因此表单字段在文档内容中清晰可见.另一个优点是可以应用文档保护,以便即使在Word的GUI中也只能填写表格字段.这对于保留此类合同文档免于不必要的更改确实非常有用.

(注1):至少Word防止格式化表单字段内容的不同部分,因此将表单字段内容拆分为不同的行.但是,其他文字处理软件(例如Writer)可能不遵守此限制.

所以我会有这样的Word模板:

灰色字段是Word中的旧格式 Textfields ,分别命名为Text1Text2Text3. Textfields 块如下所示:

<xml-fragment w:rsidR="00833656" 
  ...
 xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" 
 ... >
  <w:rPr>
    <w:rFonts w:eastAsia="Times-Roman"/>
    <w:color w:themeColor="text1" w:val="000000"/>
    <w:lang w:val="en-US"/>
  </w:rPr>
    <w:fldChar w:fldCharType="begin">
      <w:ffData>
        <w:name w:val="Text1"/>
        <w:enabled w:val="0"/>
        <w:calcOnExit w:val="0"/>
        <w:textInput>
          <w:default w:val="<введите заказчика>"/>
        </w:textInput>
      </w:ffData>
    </w:fldChar>
  </xml-fragment>
</xml-fragment>

然后输入以下代码:

import java.io.FileOutputStream;
import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;

import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import org.apache.xmlbeans.SimpleValue;
import javax.xml.namespace.QName;

public class WordReplaceTextInFormFields {

 private static void replaceFormFieldText(XWPFDocument document, String ffname, String text) {
  boolean foundformfield = false;
  for (XWPFParagraph paragraph : document.getParagraphs()) {
   for (XWPFRun run : paragraph.getRuns()) {
    XmlCursor cursor = run.getCTR().newCursor();
    cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:fldChar/@w:fldCharType");
    while(cursor.hasNextSelection()) {
     cursor.toNextSelection();
     XmlObject obj = cursor.getObject();
     if ("begin".equals(((SimpleValue)obj).getStringValue())) {
      cursor.toParent();
      obj = cursor.getObject();
      obj = obj.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/@w:val")[0];
      if (ffname.equals(((SimpleValue)obj).getStringValue())) {
       foundformfield = true;
      } else {
       foundformfield = false;
      }
     } else if ("end".equals(((SimpleValue)obj).getStringValue())) {
      if (foundformfield) return;
      foundformfield = false;
     }
    }
    if (foundformfield && run.getCTR().getTList().size() > 0) {
     run.getCTR().getTList().get(0).setStringValue(text);
//System.out.println(run.getCTR());
    }
   }
  }
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("WordTemplate.docx"));

  replaceFormFieldText(document, "Text1", "Моя Компания");
  replaceFormFieldText(document, "Text2", "Аксель Джоачимович Рихтер");
  replaceFormFieldText(document, "Text3", "Доверенность");

  FileOutputStream out = new FileOutputStream("WordReplaceTextInFormFields.docx");
  document.write(out);
  out.close();
  document.close();
 }
}

此代码需要所有架构ooxml-schemas-1.3.jar的完整jar,如 FAQ-N10025 .

产生:

I want to do replacements in MS Word (.docx) document using regular expression (java RegEx):

Example: 
 …, с одной стороны, и %SOME_TEXT% именуемое в дальнейшем «Заказчик», в 
 лице  %SOME_TEXT%   действующего на основании %SOME_TEXT% с другой стороны, 
 заключили настоящий Договор о нижеследующем: …

I tried to get text templates (like %SOME_TEXT%) use Apache POI - XWPF and replace text, but replacement is not guaranteed, because POI separates runs => I get something like this(System.out.println(run.getText(0))):

…
, с одной стороны, и 
%
SOME_TEXT
%

именуемое 
в дальнейшем «Заказчик», в лице

%
SOME
_
TEXT
%

code example:

FileInputStream fis = new FileInputStream(new File("document.docx"));
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
paragraphs.forEach(para -> {
    para.getRuns().forEach(run -> {
        String text = run.getText(0);
        if (text != null) {
           System.out.println(text);
           // text replacement process
           // run.setText(newText,0);
        }
    });
});

I have found many similar questions (like this "Replacing a text in Apache POI XWPF "), but did not found answer to my problem (answer here "Seperated text line in Apache POI XWPFRun object" offer inconvenient solution).

I tried to use docx4j and this example => "docx4j find and replace", but docx4j works similar.

For docx4j, see stackoverflow.com/questions/17093781/… – JasonPlutext

I tried to use docx4j => documentPart.variableReplace(mappings);, but replacement not guaranteed(plutext/docx4j).

Did you use VariablePrepare? stackoverflow.com/a/17143488/1031689 – JasonPlutext

Yes, no results:

WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new File("test.docx"));
HashMap<String, String> mappings = new HashMap<>();
VariablePrepare.prepare(wordMLPackage);//see notes
mappings.put("SOME_TEXT", "XXXX");
wordMLPackage.getMainDocumentPart().variableReplace(mappings);
wordMLPackage.save(new File("out.docx"));

Input\output text:

Input:
…, с одной стороны, и ${SOME_TEXT} именуемое в дальнейшем «Заказчик» ...
Output:
…, с одной стороны, и SOME_TEXT именуемое в дальнейшем «Заказчик» ...

To see your runs after VariablePrepare, turn on INFO level logging for VariablePrepare, or just System.out.println(wordMLPackage.getMainDocumentPart().getXML())

I understand that templates were separated to different Runs, but main question of the topic, how not to separate template to different Runs. I use System.out.println(wordMLPackage.getMainDocumentPart().getXML()) and saw:

<w:r>
   <w:t xml:space="preserve">, с одной стороны, и </w:t>
</w:r>
<w:r><w:t>$</w:t></w:r>
<w:r><w:t>{</w:t></w:r>
<w:r>
    <w:rPr>
       <w:rFonts w:eastAsia="Times-Roman"/>
          <w:color w:val="000000" w:themeColor="text1"/>
          <w:lang w:val="en-US"/>
    </w:rPr>
    <w:t>SOME</w:t>        <!-- First part of template: "SOME" -->
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:eastAsia="Times-Roman"/>
        <w:color w:val="000000" w:themeColor="text1"/>
    </w:rPr>
    <w:t>_</w:t>           <!-- Second part of template: "_"   -->
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:eastAsia="Times-Roman"/>
        <w:color w:val="000000" w:themeColor="text1"/>
        <w:lang w:val="en-US"/>
    </w:rPr>
    <w:t>TEXT</w:t>        <!-- Third part of template: "TEXT" -->
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:eastAsia="Times-Roman"/>
        <w:color w:val="000000" w:themeColor="text1"/>
    </w:rPr>
    <w:t>}</w:t>
</w:r>

, that template located in different xml tags and I do not understand WHY...

Please help me to find convenient approach to replace text.....

解决方案

As you see, the approach "to do replacements in MS Word (.docx) document using regular expression (java RegEx)" is not really good since you never can be sure that the text to replace will be together in one text-run. Better approach is using fields (merge fields or form fields) or content controls in Word.

My favourites for such requirements are still the good old form fields in Word.

First advantage is that even without document protection it will not be possible formatting parts of form field content different and so tearing apart the form field content into different runs (but see note 1). Second advantage is that because of the gray background the form fields are good visible in document content. And another advantage is the possibility applying a document protection so that only filling the form fields will be possibly, even in Word' s GUI. This is really good for preserving such contractual documents from unwanted changings.

(Note 1): At least Word prevents formatting parts of form field content different and so tearing apart the form field content into different runs. Other word-processing software (Writer for example) may not respecting this restriction though.

So I would have the Word template like so:

The grey fields are the good old form Textfields in Word, named Text1, Text2 and Text3. Textfields blocks look like:

<xml-fragment w:rsidR="00833656" 
  ...
 xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" 
 ... >
  <w:rPr>
    <w:rFonts w:eastAsia="Times-Roman"/>
    <w:color w:themeColor="text1" w:val="000000"/>
    <w:lang w:val="en-US"/>
  </w:rPr>
    <w:fldChar w:fldCharType="begin">
      <w:ffData>
        <w:name w:val="Text1"/>
        <w:enabled w:val="0"/>
        <w:calcOnExit w:val="0"/>
        <w:textInput>
          <w:default w:val="<введите заказчика>"/>
        </w:textInput>
      </w:ffData>
    </w:fldChar>
  </xml-fragment>
</xml-fragment>

Then the following code:

import java.io.FileOutputStream;
import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;

import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import org.apache.xmlbeans.SimpleValue;
import javax.xml.namespace.QName;

public class WordReplaceTextInFormFields {

 private static void replaceFormFieldText(XWPFDocument document, String ffname, String text) {
  boolean foundformfield = false;
  for (XWPFParagraph paragraph : document.getParagraphs()) {
   for (XWPFRun run : paragraph.getRuns()) {
    XmlCursor cursor = run.getCTR().newCursor();
    cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:fldChar/@w:fldCharType");
    while(cursor.hasNextSelection()) {
     cursor.toNextSelection();
     XmlObject obj = cursor.getObject();
     if ("begin".equals(((SimpleValue)obj).getStringValue())) {
      cursor.toParent();
      obj = cursor.getObject();
      obj = obj.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/@w:val")[0];
      if (ffname.equals(((SimpleValue)obj).getStringValue())) {
       foundformfield = true;
      } else {
       foundformfield = false;
      }
     } else if ("end".equals(((SimpleValue)obj).getStringValue())) {
      if (foundformfield) return;
      foundformfield = false;
     }
    }
    if (foundformfield && run.getCTR().getTList().size() > 0) {
     run.getCTR().getTList().get(0).setStringValue(text);
//System.out.println(run.getCTR());
    }
   }
  }
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("WordTemplate.docx"));

  replaceFormFieldText(document, "Text1", "Моя Компания");
  replaceFormFieldText(document, "Text2", "Аксель Джоачимович Рихтер");
  replaceFormFieldText(document, "Text3", "Доверенность");

  FileOutputStream out = new FileOutputStream("WordReplaceTextInFormFields.docx");
  document.write(out);
  out.close();
  document.close();
 }
}

This code needs the full jar of all of the schemas ooxml-schemas-1.3.jar as mentioned in FAQ-N10025.

Produces:

这篇关于替换.docx内的文本模板(Apache POI,Docx4j或其他)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆