Apache poi 从文本框中获取表格 [英] Apache poi get table from text box

查看:32
本文介绍了Apache poi 从文本框中获取表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 docx 文件中使用 apache poi 作为迭代表.一切正常,但如果文本框中有表格,我的代码看不到表格 - table.size() = 0

I'm using apache poi for iteration table in docx file. All works fine but if table in text box, my code don't see table - table.size() = 0

XWPFDocument doc = new XWPFDocument(new FileInputStream(fileName));

    List<XWPFTable> table = doc.getTables(); 

    for (XWPFTable xwpfTable : table) { 
        List<XWPFTableRow> row = xwpfTable.getRows();
        for (XWPFTableRow xwpfTableRow : row) { 
            List<XWPFTableCell> cell = xwpfTableRow.getTableCells();
            for (XWPFTableCell xwpfTableCell : cell) {
                if(xwpfTableCell != null){
                 List<XWPFTable> itable = xwpfTableCell.getTables(); 
                    if(itable.size()!=0){ 
                        for (XWPFTable xwpfiTable : itable) { 
                            List<XWPFTableRow> irow = xwpfiTable.getRows(); 
                            for (XWPFTableRow xwpfiTableRow : irow) { 
                                List<XWPFTableCell> icell = xwpfiTableRow.getTableCells(); 
                                for (XWPFTableCell xwpfiTableCell : icell) { 
                                    if(xwpfiTableCell!=null){   
                                    } 
                                } 
                            } 
                        } 
                    } 
                } 
            }
        } 
    }

推荐答案

以下代码是低级解析 *.docx 文档并获取其文档正文中的所有表格.

Following code is low level parsing a *.docx document and getting all tables in document body of it.

该方法是使用 org.apache.xmlbeans.XmlCursor 并在 document.xml 中搜索所有 w:tbl 元素.如果找到,将它们添加到 List.

The approach is using a org.apache.xmlbeans.XmlCursor and searching for all w:tbl elements in document.xml. If found add them to a List<CTTbl>.

因为文本框矩形形状在 document.xml 中提供了备用内容,所以我们需要跳过 mc:Fallback 元素.否则我们将在文本框中放置两次表格.

Because a text box rectangle shape provides fall-back content in the document.xml, we need to skip the mc:Fallback elements. Else we would have the tables within the text boxes twice.

最后我们遍历List,得到所有表的内容.

At last we go through the List<CTTbl> and get the contents of all the tables.

import java.io.*;
import org.apache.poi.xwpf.usermodel.*;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBody;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTbl;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTRow;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTc;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTText;

import org.apache.xmlbeans.impl.values.XmlAnyTypeImpl;
import org.apache.xmlbeans.XmlCursor;

import javax.xml.namespace.QName;

import java.util.List;
import java.util.ArrayList;

public class WordReadAllTables {

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("22.docx"));

  CTBody ctbody = document.getDocument().getBody();

  XmlCursor xmlcursor = ctbody.newCursor();

  QName qnameTbl = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "tbl", "w");
  QName qnameFallback = new QName("http://schemas.openxmlformats.org/markup-compatibility/2006", "Fallback", "mc");

  List<CTTbl> allCTTbls = new ArrayList<CTTbl>();

  while (xmlcursor.hasNextToken()) {
   XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
   if (tokentype.isStart()) {
    if (qnameTbl.equals(xmlcursor.getName())) {
     if (xmlcursor.getObject() instanceof CTTbl) {
      allCTTbls.add((CTTbl)xmlcursor.getObject());
     } else if (xmlcursor.getObject() instanceof XmlAnyTypeImpl) {
      allCTTbls.add(CTTbl.Factory.parse(xmlcursor.getObject().toString()));
     }
    } else if (qnameFallback.equals(xmlcursor.getName())) {
     xmlcursor.toEndToken();
    }
   } 
  }

  for (CTTbl cTTbl : allCTTbls) {
   StringBuffer tableHTML = new StringBuffer();
   tableHTML.append("<table>\n");
   for (CTRow cTRow : cTTbl.getTrList()) {
    tableHTML.append(" <tr>\n");
    for (CTTc cTTc : cTRow.getTcList()) {
     tableHTML.append("  <td>");
     for (CTP cTP : cTTc.getPList()) {
      for (CTR cTR : cTP.getRList()) {
       for (CTText cTText : cTR.getTList()) {
        tableHTML.append(cTText.getStringValue());
       }
      }
     }
     tableHTML.append("</td>");
    }
    tableHTML.append("\n </tr>\n");
   }
   tableHTML.append("</table>");

   System.out.println(tableHTML);

  }

  document.close();

 }
}

此代码需要faq-N10025.

This code needs the full jar of all of the schemas ooxml-schemas-1.3.jar as mentioned in faq-N10025.

这篇关于Apache poi 从文本框中获取表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆