处理word文档java的问题 [英] Problem with processing word document java

查看:29
本文介绍了处理word文档java的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在java中替换Word文档文件中的一些字段.我正在使用Apache Poi库,我正在使用此代码替换单词.

for (XWPFParagraph p : doc.getParagraphs()) {列表运行 = p.getRuns();如果(运行!= null){for (XWPFRun r : 运行) {字符串文本 = r.getText(0);如果(文本!= null){System.out.println(text);如果(文本.包含([标题]")){text = text.replace("[Title]", wordBody.getTitle());//你的内容r.setText(text, 0);}if(text.contains("[Ref_no]")){text=text.replace("[Ref_no]",wordBody.getRefNumber());r.setText(text,0);}if(text.contains("[In_date]")){text=text.replace("[In_date]",wordBody.getDate());r.setText(text,0);}if(text.contains("[名字]")){text=text.replace("[FirstName]",wordBody.getFirstName());r.setText(text,0);}if(text.contains("[中间名]")){text=text.replace("[MiddleName]",wordBody.getMiddleName());r.setText(text,0);}if(text.contains("[Vehicle_Type]")){text=text.replace("[Vehicle_Type]",wordBody.getVehicleType());r.setText(text,0);}if(text.contains("[Reg_No]")){text=text.replace("[Reg_No]",wordBody.getRegNumber());r.setText(text,0);}if(text.contains("[位置]")){text=text.replace("[Location]",wordBody.getLocation());r.setText(text,0);}if(text.contains("[Issuer_Name]")){text=text.replace("[Issuer_Name]",wordBody.getLocation());r.setText(text,0);}}}}}

所以我提到并不是所有的单词都被替换了,我不知道如何解决它,然后我打印了我得到的所有文本,我得到了类似的东西

这是为了证明 [Title] [FirstName] [中间名字] [姓氏] 已购买 [车辆类型]有注册 [注册号] 来自我们的 [位置] 陈列室.由...发出,[发行人

所以我需要替换 [] 括号中的字段,其中一些字段作为 [Surname] 打印没问题,但其中一些作为 [MIddleName] 正在改变行,我认为这是行不通的方式.

这是我的文字

我正在解析 docx 文件.谢谢

解决方案

如果您查看屏幕截图,您将看到 MiddleName、Vehicle_Type 和 Reg_No 下的红色波浪线.这意味着,Word 在此处检测到可能存在拼写问题.这也存储在文件中,这就是为什么文本 [MIddleName]、[Vehicle_Type] 和 [Reg_No] 不在一个文本中与其周围的括号一起运行.括号在他们自己的文本中,并且文本连同可能的拼写问题一起被标记.

这是一个众所周知的问题,一些库已经尝试通过检测文本变量来解决这个问题,这是一种比仅在文本运行中搜索它们更复杂的方法.例如,有

所有灰色字段都是从开发人员选项卡插入的旧文本表单字段.在它们的 Text Form Field Options 中,Bookmark: 名称是 Text1, Text2, ... 并且默认文本是根据需要设置.

代码:

import java.io.FileOutputStream;导入 java.io.FileInputStream;导入 org.apache.poi.xwpf.usermodel.*;导入 org.apache.xmlbeans.XmlObject;导入 org.apache.xmlbeans.XmlCursor;导入 org.apache.xmlbeans.SimpleValue;导入 javax.xml.namespace.QName;公共类 WordReplaceTextInFormFields {私有静态无效replaceFormFieldText(XWPFDocument文档,字符串ffname,字符串文本){boolean foundformfield = false;for (XWPFParagraph 段落: document.getParagraphs()) {for (XWPFRun 运行:paragraph.getRuns()) {XmlCursor 游标 = run.getCTR().newCursor();cursor.selectPath("声明命名空间 w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:fldChar/@w:fldCharType");while(cursor.hasNextSelection()) {cursor.toNextSelection();XmlObject obj = cursor.getObject();if ("begin".equals(((SimpleValue)obj).getStringValue())) {cursor.toParent();obj = cursor.getObject();obj = obj.selectPath("声明命名空间 w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/@w:val")[0];如果 (ffname.equals(((SimpleValue)obj).getStringValue())) {foundformfield = true;} 别的 {foundformfield = false;}} else if ("end".equals(((SimpleValue)obj).getStringValue())) {如果(foundformfield)返回;foundformfield = false;}}if (foundformfield && run.getCTR().getTList().size() > 0) {run.getCTR().getTList().get(0).setStringValue(text);//System.out.println(run.getCTR());}}}}public static void main(String[] args) 抛出异常 {XWPFDocument 文档 = new XWPFDocument(new FileInputStream("WordTemplate.docx"));replaceFormFieldText(document, "Text1", "Mrs.");replaceFormFieldText(document, "Text2", "Janis");replaceFormFieldText(document, "Text3", "Lyn");replaceFormFieldText(document, "Text4", "Joplin");replaceFormFieldText(document, "Text5", "Mercedes Benz");replaceFormFieldText(document, "Text6", "1234-56-789");replaceFormFieldText(document, "Text7", "Stuttgart");FileOutputStream out = new FileOutputStream("WordReplaceTextInFormFields.docx");document.write(out);关闭();文档.close();}}

此代码使用 apache poi 4.1.0 进行测试,并且需要 ooxml-schemas-1.4.jar 中提到的所有模式的完整 jar="https://poi.apache.org/faq.html#faq-N10025" rel="nofollow noreferrer">FAQ-N10025.

结果:

请注意,文本字段的灰色背景仅在 GUI 中可见.默认不会打印出来.

优点:

表单域内容只能整体格式化.所以表单字段内容永远不会被撕裂.

可以保护文档,因此只能填写表单字段.然后模板也可以用作 Word GUI 中的表单.

i need to replace some fields in Word Document file in java.I am using Apache Poi library , i am using this code to replace words.

for (XWPFParagraph p : doc.getParagraphs()) {
                List<XWPFRun> runs = p.getRuns();
                if (runs != null) {
                    for (XWPFRun r : runs) {
                        String text = r.getText(0);
                        if (text != null)  {
                            System.out.println(text);
                            if (text.contains("[Title]")) {
                                text = text.replace("[Title]", wordBody.getTitle());//your content
                                r.setText(text, 0);
                            }if(text.contains("[Ref_no]")){
                                text=text.replace("[Ref_no]",wordBody.getRefNumber());
                                r.setText(text,0);
                            }
                            if(text.contains("[In_date]")){
                                text=text.replace("[In_date]",wordBody.getDate());
                                r.setText(text,0);
                            }if(text.contains("[FirstName]")){
                                text=text.replace("[FirstName]",wordBody.getFirstName());
                                r.setText(text,0);
                            }if(text.contains("[MiddleName]")){
                                text=text.replace("[MiddleName]",wordBody.getMiddleName());
                                r.setText(text,0);
                            }if(text.contains("[Vehicle_Type]")){
                                text=text.replace("[Vehicle_Type]",wordBody.getVehicleType());
                                r.setText(text,0);
                            }if(text.contains("[Reg_No]")){
                                text=text.replace("[Reg_No]",wordBody.getRegNumber());
                                r.setText(text,0);
                            }if(text.contains("[Location]")){
                                text=text.replace("[Location]",wordBody.getLocation());
                                r.setText(text,0);
                            }if(text.contains("[Issuer_Name]")){
                                text=text.replace("[Issuer_Name]",wordBody.getLocation());
                                r.setText(text,0);
                            }

                        }
                    }
                }
            }

So i mentioned that not all words a replaced and i didn't know how to fix it , then i printed out all text what i get and i got something like that

This is to certify that [Title] [FirstName] [
MiddleName
] [Surname] has purchased [
Vehicle_Type
] 
having registration [
Reg_No
] from our [Location] Showroom.
Issued By,
[
Issuer

So i need replace fields in [] brackets and some of them as [Surname] a printed okay but some of them as [MIddleName] are changing line and i think that s way its not working .

This - is my word text

I parsing docx file . Thank you

解决方案

If you have a look on your screen shot, you will see the red wavy line under MiddleName, Vehicle_Type and Reg_No. That means, that Word has detected a possible spelling problem here. This also is stored in the file and that's why the texts [MIddleName], [Vehicle_Type] and [Reg_No] are not together in one text run with their surrounding brackets. The brackets are in their own text runs and also the texts together with the possible spelling problem marked.

This is a well known problem and some libraries already try solving this by detecting the text variables a more complex way than only searching them in text runs. There is templ4docx for example.

But my preferred way is another. Word for a long time provides using text form fields. See Working with Form Fields. Note the legacy form fields are meant, not the ActiveX ones.

See Replace text templates inside .docx (Apache POI, Docx4j or other) for an example.

Modified example for your case:

WordTemplate.docx:

All gray fields are legacy text form fields inserted from developer tab. In their Text Form Field Options the Bookmark: names are Text1, Text2, ... and default texts are set as needed.

Code:

import java.io.FileOutputStream;
import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;

import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import org.apache.xmlbeans.SimpleValue;
import javax.xml.namespace.QName;

public class WordReplaceTextInFormFields {

 private static void replaceFormFieldText(XWPFDocument document, String ffname, String text) {
  boolean foundformfield = false;
  for (XWPFParagraph paragraph : document.getParagraphs()) {
   for (XWPFRun run : paragraph.getRuns()) {
    XmlCursor cursor = run.getCTR().newCursor();
    cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:fldChar/@w:fldCharType");
    while(cursor.hasNextSelection()) {
     cursor.toNextSelection();
     XmlObject obj = cursor.getObject();
     if ("begin".equals(((SimpleValue)obj).getStringValue())) {
      cursor.toParent();
      obj = cursor.getObject();
      obj = obj.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/@w:val")[0];
      if (ffname.equals(((SimpleValue)obj).getStringValue())) {
       foundformfield = true;
      } else {
       foundformfield = false;
      }
     } else if ("end".equals(((SimpleValue)obj).getStringValue())) {
      if (foundformfield) return;
      foundformfield = false;
     }
    }
    if (foundformfield && run.getCTR().getTList().size() > 0) {
     run.getCTR().getTList().get(0).setStringValue(text);
//System.out.println(run.getCTR());
    }
   }
  }
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("WordTemplate.docx"));

  replaceFormFieldText(document, "Text1", "Mrs.");
  replaceFormFieldText(document, "Text2", "Janis");
  replaceFormFieldText(document, "Text3", "Lyn");
  replaceFormFieldText(document, "Text4", "Joplin");
  replaceFormFieldText(document, "Text5", "Mercedes Benz");
  replaceFormFieldText(document, "Text6", "1234-56-789");
  replaceFormFieldText(document, "Text7", "Stuttgart");

  FileOutputStream out = new FileOutputStream("WordReplaceTextInFormFields.docx");
  document.write(out);
  out.close();
  document.close();
 }
}

This code is tested using apache poi 4.1.0 and needs the full jar of all of the schemas ooxml-schemas-1.4.jar as mentioned in FAQ-N10025.

Result:

Note the gray background of the text fields is only visible in GUI. It will not be printed out by default.

Advantages:

The form field content can only be formatted as whole. So form field content will never torn apart.

The document can be protected so only filling the form fields is possible. Then the template is usable as a form in Word GUI too.

这篇关于处理word文档java的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆