Apache POI XWPFRun isBold 未检测到粗体 [英] Apache POI XWPFRun isBold does not detect bold

查看:79
本文介绍了Apache POI XWPFRun isBold 未检测到粗体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含粗体文本的 doxc 文档.出于某种原因 run.isBold() 是 returnign false 虽然 run itel 是粗体的.这里可能有什么问题?

I have a doxc document that has some bold text. For some reason run.isBold() is returnign false although the run itsel is in bold. What might be the issue here?

我用来读取文件的代码:

The code i am useing to read the file:

XWPFDocument document = new XWPFDocument(fis);
        for(XWPFParagraph paragraphs: document.getParagraphs()){

            for(XWPFRun run: paragraphs.getRuns()){
                System.out.println(run.isBold());

                System.out.println(run.text());
        }

文件内容如下所示:

  1. Mõisted

2.1.一些文字

2.1.1.合并 - 一些文本

2.1.1. Pooled – some text

奇怪的是,文件开头的标题 (HANKELEPINGU ÜLDTINGIMUSED) 是粗体的,但之后没有任何内容是粗体的.

Weird is it gets that the title (HANKELEPINGU ÜLDTINGIMUSED) at the beginnign of the file is bold, but after that nothing is bold.

推荐答案

在我检查了你的 test.docx 文件后,我可以告诉你以下内容:

After I inspected your test.docx file I can tell you the following:

文本Üldosa"和Mõisted"不是粗体,因为它们的格式为粗体,而是因为整个段落的样式为Heading2".并且文本Pooled"也没有被格式化为粗体,而是应用了特殊的字符样式Paks".所以有人用过 Word 样式 广泛.一点也不差.正如 HTML 应该使用 CSS 样式表而不是直接格式化一样,在 Word 中也应该首选使用样式.但当然解析时的问题也是一样的.如果不额外解析样式表,就无法确定文本将如何呈现.不幸的是,apache poi 直到现在还不太关心样式.

The text "Üldosa" and "Mõisted" are not bold because they was formatted bold but because the whole paragraph is in style "Heading2". And the text "Pooled" also is not formatted bold but the special character style "Paks" is applied. So someone has used Word Styles extensively. Not bad at all. Just as HTML should rather be formatted using CSS style sheets rather than directly, in Word also using style should be preferred. But of course also the problems while parsing are the same. Without additional parsing the style sheets one cannot determine how the text shall be presented. Unfortunately apache poi not takes much care about styles until now.

人们是如何获得这种洞察力的?*.docx 文件只是一个 ZIP 存档.所以我们可以解压它,会发现:

How can one come to that insight? A *.docx file is simply a ZIP archive. So we can unzip it and will find:

/word/document.xml:

<w:r ...>
 <w:rPr>
  ...
  <w:b/>
  ...
 </w:rPr>
 <w:t>HANKELEPINGU ÜLDTINGIMUSED</w:t>
</w:r>

这是一个文本运行,真正格式化为粗体.

This is a text run really formatted bold directly.

但是

<w:p ...>
 <w:pPr>
  <w:pStyle w:val="Heading2"/>
  <w:numPr><w:ilvl w:val="0"/><w:numId w:val="2"/></w:numPr> 
  ...
 </w:pPr>
 <w:r ...>
  <w:t>Üldosa</w:t>
 </w:r>
</w:p>

这是一个样式为Heading2"的段落,并自动编号.

this is a paragraph having style "Heading2" and is automatic numbered.

那么为什么该文本是粗体的?在 /word/styles.xml 中我们发现:

So why is that text bold? In /word/styles.xml we find:

<w:style w:type="paragraph" w:styleId="Heading2">
 <w:name w:val="heading 2"/>
 <w:basedOn w:val="Normal"/>
 ...
 <w:link w:val="Heading2Char"/>
 ...
</w:style>

这是链接到字符样式Heading2Char"的段落样式Heading2".

This is the paragraph style "Heading2" which links to the character style "Heading2Char".

<w:style w:type="character" w:customStyle="1" w:styleId="Heading2Char">
 <w:name w:val="Heading 2 Char"/>
 ...
 <w:link w:val="Heading2"/>
 ...
 <w:rPr>
  ...
  <w:b/>
  ...
 </w:rPr>
</w:style>

这是设置为粗体的字符样式Heading2Char".

This is the character style "Heading2Char" which is set bold.

要回答如何使用 apache poi 进行这个问题,必须知道 apache poi XWPF 基于 org.openxmlformats.schemas.wordprocessingml.x2006.main.* 来自 ooxml-schemas.*.jar 的类.所以我们需要这方面的信息.不幸的是,没有任何公开的 API 文档可用.所以我们需要下载源代码并自己做javadoc.

To answer the question how to proceed this using apache poi one must know that apache poi XWPF bases on the org.openxmlformats.schemas.wordprocessingml.x2006.main.* classes which comes from ooxml-schemas.*.jar. So we need information about this. Unfortunately there is not any API documentation public available. So we need downloading the sources and doing javadoc our self.

接下来怎么办?迭代段落并按照您已经完成的方式运行.但是对于每个段落的附加尝试获得该段落的样式.如果有,请获取它和它的字符样式并检查它提供的设置.同样对于每次运行,尝试获取此运行的字符样式.如果有,请获取并检查它提供的设置.

So what to do next? Iterating over paragraphs and runs as you done already. But additional for each paragraph try to get the style for this paragraph. If there is one, get it and it's character style and check what settings it provides. Also for each run try to get the character style for this run. If there is one, get it and check what settings it provides.

下面的代码就是这样做的,但只是为了检查样式是否提供了粗体设置.所以它真的不完整,让它完整真的会很贵.

The following code is doing this, but only to check whether the style provides bold setting. So it is really not complete and making it complete will really be much expensive.

import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;

public class ReadWordHavingStyles {

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("test.docx"));

  XWPFStyles styles = document.getStyles();
  XWPFStyle style = null;
  boolean isPBold = false;
  boolean isRBold = false;
  String boldReasonP = "";
  String boldReasonR = "";
  CTRPr cTRPr = null;

  for(XWPFParagraph paragraph : document.getParagraphs()) {
   isPBold = false;
   boldReasonP = "";
   String pStyleId = paragraph.getStyleID();
   if (pStyleId != null) {
    style = styles.getStyle(pStyleId);
    if (style != null) {
     String linkStyleId = style.getLinkStyleID();
     style = styles.getStyle(linkStyleId);
     if (style != null) {
      cTRPr = style.getCTStyle().getRPr();
      if (cTRPr != null) {
       if (!cTRPr.isSetB()) {
        isPBold = false;
       } else {
        STOnOff.Enum val = cTRPr.getB().getVal();
        isPBold = !((STOnOff.FALSE == val) || (STOnOff.X_0 == val) || (STOnOff.OFF == val));
       }
      }
      boldReasonP = " whole P is " + ((isPBold)?"":"not ") + "bold because of style " + linkStyleId;
     }
    }
   }

   if (!isPBold) boldReasonP = " P is not bold";

   for(XWPFRun run : paragraph.getRuns()){
    isRBold = isPBold;
    boldReasonR = "";
    cTRPr = run.getCTR().getRPr();
    if (cTRPr != null) {
     CTString rStyle = cTRPr.getRStyle();
     if (rStyle != null) {
      String rStyleId = rStyle.getVal();
      style = styles.getStyle(rStyleId);
      if (style != null) {
       cTRPr = style.getCTStyle().getRPr();
       if (cTRPr != null) {
        if (!cTRPr.isSetB()) {
         isRBold = false;
        } else {
         STOnOff.Enum val = cTRPr.getB().getVal();
         isRBold = !((STOnOff.FALSE == val) || (STOnOff.X_0 == val) || (STOnOff.OFF == val));
        }      
       }
       boldReasonR = " run is " + ((isRBold)?"":"not ") + "bold because of style " + rStyleId;
      }
     }
    }

    if (!isRBold) boldReasonR = " run is not bold";

    cTRPr = run.getCTR().getRPr();
    if (cTRPr != null) {
     if (cTRPr.isSetB()) {
      STOnOff.Enum val = cTRPr.getB().getVal();
      isRBold = !((STOnOff.FALSE == val) || (STOnOff.X_0 == val) || (STOnOff.OFF == val));
      boldReasonR = " run is " + ((isRBold)?"":"not ") + "bold because of direct formatting";
     }      
    }

    System.out.println(run.text() + " isBold:" + isRBold + ":" + boldReasonP + boldReasonR);

   }
  }

  document.close();
 }
}

这篇关于Apache POI XWPFRun isBold 未检测到粗体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆