删除 XWPFParagraph 为其保留段落符号 (¶) [英] Removing an XWPFParagraph keeps the paragraph symbol (¶) for it

查看:31
本文介绍了删除 XWPFParagraph 为其保留段落符号 (¶)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Apache POI 从 Microsoft Word 文档中删除一组连续的段落.

I am trying to remove a set of contiguous paragraphs from a Microsoft Word document, using Apache POI.

据我所知,可以通过删除所有段落来删除段落,如下所示:

From what I have understood, deleting a paragraph is possible by removing all of its runs, this way:

/*
 * Deletes the given paragraph.
 */
public static void deleteParagraph(XWPFParagraph p) {
    if (p != null) {
        List<XWPFRun> runs = p.getRuns();
        //Delete all the runs
        for (int i = runs.size() - 1; i >= 0; i--) {
            p.removeRun(i);
        }
        p.setPageBreak(false); //Remove the eventual page break
    }
}

事实上,它有效,但有一些奇怪的地方.删除的段落块不会从文档中消失,而是转换为一组空行.就像每个段落都会被转换成一个新行.

In fact, it works, but there's something strange. The block of removed paragraphs does not disappear from the document, but it's converted in a set of empty lines. It's just like every paragraph would be converted into a new line.

通过从代码中打印段落的内容,我实际上可以看到一个空格(每个删除的空格).直接从文档中查看内容,启用格式标记的可视化,我可以看到:

By printing the paragraphs' content from code I can see, in fact, a space (for each one removed). Looking at the content directly from the document, with the formatting mark's visualization enabled, I can see this:

¶ 的垂直列对应于删除元素的块.

The vertical column of ¶ corresponds to the block of deleted elements.

你对此有什么想法吗?我希望我的段落完全被删除.

Do you have an idea for that? I'd like my paragraphs to be completely removed.

我还尝试通过替换文本(使用 setText())并删除可以自动添加的最终空格,如下所示:

I also tried by replacing the text (with setText()) and by removing eventual spaces that could be added automatically, this way:

p.setSpacingAfter(0);
p.setSpacingAfterLines(0);
p.setSpacingBefore(0);
p.setSpacingBeforeLines(0);
p.setIndentFromLeft(0);
p.setIndentFromRight(0);
p.setIndentationFirstLine(0);
p.setIndentationLeft(0);
p.setIndentationRight(0);

但没有运气.

推荐答案

我会通过删除段落来删除段落,而不是仅删除此段落中的运行.删除段落不是 apache poi 高级 API 的一部分.但是使用 XWPFDocument.getDocument().getBody() 我们可以获得低级别的 CTBody 并且有一个removeP(int i).

I would delete paragraphs by deleting paragraphs, not by deleting only the runs in this paragraphs. Deleting paragraphs is not part of the apache poi high level API. But using XWPFDocument.getDocument().getBody() we can get the low level CTBody and there is a removeP(int i).

示例:

import java.io.*;
import org.apache.poi.xwpf.usermodel.*;

import java.awt.Desktop;

import org.apache.poi.openxml4j.exceptions.InvalidFormatException;

public class WordRemoveParagraph {

 /*
  * Deletes the given paragraph.
  */

 public static void deleteParagraph(XWPFParagraph p) {
  XWPFDocument doc = p.getDocument();
  int pPos = doc.getPosOfParagraph(p);
  //doc.getDocument().getBody().removeP(pPos);
  doc.removeBodyElement(pPos);
 }

 public static void main(String[] args) throws IOException, InvalidFormatException {

  XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));

  int pNumber = doc.getParagraphs().size() -1;
  while (pNumber >= 0) {
   XWPFParagraph p = doc.getParagraphs().get(pNumber);
   if (p.getParagraphText().contains("delete")) {
    deleteParagraph(p);
   }
   pNumber--;
  }

  FileOutputStream out = new FileOutputStream("result.docx");
  doc.write(out);
  out.close();
  doc.close();

  System.out.println("Done");
  Desktop.getDesktop().open(new File("result.docx"));

 }

}

这会从文档 source.docx 中删除文本包含删除"的所有段落,并将结果保存在 result.docx 中.

This deletes all paragraphs from the document source.docx where the text contains "delete" and saves the result in result.docx.

尽管 doc.getDocument().getBody().removeP(pPos); 有效,但它不会更新 XWPFDocument 的段落列表.因此它将破坏段落迭代器和对该列表的其他访问,因为列表仅在再次阅读文档时更新.

Although doc.getDocument().getBody().removeP(pPos); works, it will not update the XWPFDocument's paragraphs list. So it will destroy paragraph iterators and other accesses to that list since the list is only updated while reading the document again.

所以更好的方法是使用 doc.removeBodyElement(pPos); 代替.removeBodyElement(int pos)doc.getDocument().getBody().removeP(pos); 完全相同,如果 pos 指向文档正文中的分页图,那么段落也是一个 BodyElement.但除此之外,它还会更新XWPFDocument 的段落列表.

So the better approach is using doc.removeBodyElement(pPos); instead. removeBodyElement(int pos) does exactly the same as doc.getDocument().getBody().removeP(pos); if the pos is pointing to a pagagraph in the document body since that paragraph is an BodyElement too. But in addition, it will update the XWPFDocument's paragraphs list.

这篇关于删除 XWPFParagraph 为其保留段落符号 (¶)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆