使用Apache Poi从docx获取文本样式 [英] Getting text style from docx using Apache poi
问题描述
我正在尝试从MS docx文件中获取样式信息,编写带有粗体,斜体等样式的文件内容没有问题.字体大小等,但是读取文件内容并获取样式信息尚不清楚.我尝试使用XWPFDocument,此API似乎不具备读取样式的能力.我现在正在尝试使用XWPFWordExtractor,这似乎更有希望,但我仍然坚持获取文本的样式信息.
I'm trying to get the style information from an MS docx file, I have no problem writing file content with added styles like bold, italic. font size etc, but reading the file content and getting the style information is not so clear. I've tried using XWPFDocument, this API does not seem to have the ability to read the styles. I'm now trying XWPFWordExtractor which seems a bit more promising but I'm still stuck getting the style information for the text.
我阅读的内容类型类似于以下内容.
The type of content I reading looks similar to the following.
您好,这是粗体文本,这是斜体文本,再加上这是 粗斜体文本 "
"Hello, this is bold text and this is italic text abd this is bold-italic text"
任何指向示例的指针都很好.
Any pointers to an example would be great.
推荐答案
好的,因此根据Gagravarr的评论,下面的解决方案正是我想要的.因此,基本上Gragravarr回答了这个问题,但我不确定除了说能听到他能给他带来荣誉以外.
Okay, so based on the comments from Gagravarr, the solution is below, exactly as I wanted. So basically Gagravarr answered the question but I'm not sure how apart from saying it hear to give him credit.
for (XWPFParagraph paragraph : docx.getParagraphs()) {
int pos = 0;
for (XWPFRun run : paragraph.getRuns()) {
System.out.println("Current run IsBold : " + run.isBold());
System.out.println("Current run IsItalic : " + run.isItalic());
for (char c : run.text().toCharArray()) {
System.out.print(c);
pos++;
}
System.out.println();
}
}
`
下面的输出
Current run IsBold : false
Current run IsItalic : false
"Hello, this is
Current run IsBold : true
Current run IsItalic : false
bold text
Current run IsBold : false
Current run IsItalic : false
and this is
Current run IsBold : false
Current run IsItalic : true
italic text
Current run IsBold : false
Current run IsItalic : false
a
Current run IsBold : false
Current run IsItalic : false
n
Current run IsBold : false
Current run IsItalic : false
d this is
Current run IsBold : true
Current run IsItalic : true
bold-italic text
Current run IsBold : false
Current run IsItalic : false
"
Current run IsBold : false
Current run IsItalic : false
"Hello, this is
Current run IsBold : true
Current run IsItalic : false
bold text
Current run IsBold : false
Current run IsItalic : false
and this is
Current run IsBold : false
Current run IsItalic : true
italic text
Current run IsBold : false
Current run IsItalic : false
a
Current run IsBold : false
Current run IsItalic : false
n
Current run IsBold : false
Current run IsItalic : false
d this is
Current run IsBold : true
Current run IsItalic : true
bold-italic text
Current run IsBold : false
Current run IsItalic : false
"
这篇关于使用Apache Poi从docx获取文本样式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!