处理Word文档Java的问题 [英] Problem with processing word document java

查看:72
本文介绍了处理Word文档Java的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要替换java中Word文档文件中的某些字段.我正在使用Apache Poi库,我正在使用此代码替换单词.

i need to replace some fields in Word Document file in java.I am using Apache Poi library , i am using this code to replace words.

for (XWPFParagraph p : doc.getParagraphs()) {
                List<XWPFRun> runs = p.getRuns();
                if (runs != null) {
                    for (XWPFRun r : runs) {
                        String text = r.getText(0);
                        if (text != null)  {
                            System.out.println(text);
                            if (text.contains("[Title]")) {
                                text = text.replace("[Title]", wordBody.getTitle());//your content
                                r.setText(text, 0);
                            }if(text.contains("[Ref_no]")){
                                text=text.replace("[Ref_no]",wordBody.getRefNumber());
                                r.setText(text,0);
                            }
                            if(text.contains("[In_date]")){
                                text=text.replace("[In_date]",wordBody.getDate());
                                r.setText(text,0);
                            }if(text.contains("[FirstName]")){
                                text=text.replace("[FirstName]",wordBody.getFirstName());
                                r.setText(text,0);
                            }if(text.contains("[MiddleName]")){
                                text=text.replace("[MiddleName]",wordBody.getMiddleName());
                                r.setText(text,0);
                            }if(text.contains("[Vehicle_Type]")){
                                text=text.replace("[Vehicle_Type]",wordBody.getVehicleType());
                                r.setText(text,0);
                            }if(text.contains("[Reg_No]")){
                                text=text.replace("[Reg_No]",wordBody.getRegNumber());
                                r.setText(text,0);
                            }if(text.contains("[Location]")){
                                text=text.replace("[Location]",wordBody.getLocation());
                                r.setText(text,0);
                            }if(text.contains("[Issuer_Name]")){
                                text=text.replace("[Issuer_Name]",wordBody.getLocation());
                                r.setText(text,0);
                            }

                        }
                    }
                }
            }

所以我提到并非所有单词都被替换,并且我不知道如何解决它,然后我打印出所有文本,得到的东西我得到了

So i mentioned that not all words a replaced and i didn't know how to fix it , then i printed out all text what i get and i got something like that

This is to certify that [Title] [FirstName] [
MiddleName
] [Surname] has purchased [
Vehicle_Type
] 
having registration [
Reg_No
] from our [Location] Showroom.
Issued By,
[
Issuer

所以我需要替换[]括号中的字段,其中一些字段作为[Surname]可以打印,但是其中一些字段为[MIddleName]会更改行,并且我认为这样行不通.

So i need replace fields in [] brackets and some of them as [Surname] a printed okay but some of them as [MIddleName] are changing line and i think that s way its not working .

这-是我的文字

我正在解析docx文件. 谢谢

I parsing docx file . Thank you

推荐答案

如果您查看屏幕截图,将会在MiddleName,Vehicle_Type和Reg_No下看到红色的波浪线.这意味着Word在这里检测到可能的拼写问题.这也存储在文件中,这就是为什么文本[MIddleName],[Vehicle_Type]和[Reg_No]不在一起包含在方括号中的文本中的原因.方括号位于其自己的文本行中,还标记了文本以及可能的拼写问题.

If you have a look on your screen shot, you will see the red wavy line under MiddleName, Vehicle_Type and Reg_No. That means, that Word has detected a possible spelling problem here. This also is stored in the file and that's why the texts [MIddleName], [Vehicle_Type] and [Reg_No] are not together in one text run with their surrounding brackets. The brackets are in their own text runs and also the texts together with the possible spelling problem marked.

这是一个众所周知的问题,有些库已经尝试通过比仅在文本运行中搜索它们更复杂的方式来检测文本变量来解决此问题.例如,有 templ4docx .

This is a well known problem and some libraries already try solving this by detecting the text variables a more complex way than only searching them in text runs. There is templ4docx for example.

但是我更喜欢的方式是另一种方式.长期以来,Word提供了使用文本表单字段的功能.请参阅使用表单字段.请注意,旧表单字段是要提供的,而不是ActiveX字段.

But my preferred way is another. Word for a long time provides using text form fields. See Working with Form Fields. Note the legacy form fields are meant, not the ActiveX ones.

请参见在内部替换文本模板.docx(Apache POI,Docx4j或其他).

针对您的案例的修改示例:

Modified example for your case:

WordTemplate.docx:

WordTemplate.docx:

所有灰色字段是从开发人员"标签插入的旧文本表单字段.在其Text Form Field Options中,Bookmark:名称为Text1Text2,...,并根据需要设置默认文本.

All gray fields are legacy text form fields inserted from developer tab. In their Text Form Field Options the Bookmark: names are Text1, Text2, ... and default texts are set as needed.

代码:

import java.io.FileOutputStream;
import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;

import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import org.apache.xmlbeans.SimpleValue;
import javax.xml.namespace.QName;

public class WordReplaceTextInFormFields {

 private static void replaceFormFieldText(XWPFDocument document, String ffname, String text) {
  boolean foundformfield = false;
  for (XWPFParagraph paragraph : document.getParagraphs()) {
   for (XWPFRun run : paragraph.getRuns()) {
    XmlCursor cursor = run.getCTR().newCursor();
    cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:fldChar/@w:fldCharType");
    while(cursor.hasNextSelection()) {
     cursor.toNextSelection();
     XmlObject obj = cursor.getObject();
     if ("begin".equals(((SimpleValue)obj).getStringValue())) {
      cursor.toParent();
      obj = cursor.getObject();
      obj = obj.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/@w:val")[0];
      if (ffname.equals(((SimpleValue)obj).getStringValue())) {
       foundformfield = true;
      } else {
       foundformfield = false;
      }
     } else if ("end".equals(((SimpleValue)obj).getStringValue())) {
      if (foundformfield) return;
      foundformfield = false;
     }
    }
    if (foundformfield && run.getCTR().getTList().size() > 0) {
     run.getCTR().getTList().get(0).setStringValue(text);
//System.out.println(run.getCTR());
    }
   }
  }
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("WordTemplate.docx"));

  replaceFormFieldText(document, "Text1", "Mrs.");
  replaceFormFieldText(document, "Text2", "Janis");
  replaceFormFieldText(document, "Text3", "Lyn");
  replaceFormFieldText(document, "Text4", "Joplin");
  replaceFormFieldText(document, "Text5", "Mercedes Benz");
  replaceFormFieldText(document, "Text6", "1234-56-789");
  replaceFormFieldText(document, "Text7", "Stuttgart");

  FileOutputStream out = new FileOutputStream("WordReplaceTextInFormFields.docx");
  document.write(out);
  out.close();
  document.close();
 }
}

此代码已使用apache poi 4.1.0测试,并且需要所有架构ooxml-schemas-1.4.jar的完整jar,如 FAQ-N10025 .

This code is tested using apache poi 4.1.0 and needs the full jar of all of the schemas ooxml-schemas-1.4.jar as mentioned in FAQ-N10025.

结果:

请注意,文本字段的灰色背景仅在GUI中可见.默认情况下不会将其打印出来.

Note the gray background of the text fields is only visible in GUI. It will not be printed out by default.

优势:

表单字段内容只能全部格式化.因此,表单字段的内容永远不会撕裂.

The form field content can only be formatted as whole. So form field content will never torn apart.

文档可以受到保护,因此只能填写表格字段.然后,该模板也可以在Word GUI中用作表单.

The document can be protected so only filling the form fields is possible. Then the template is usable as a form in Word GUI too.

这篇关于处理Word文档Java的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆