如何为同一段落设置定义不同的样式 [英] How to set define different styles for the same paragraph

查看:44
本文介绍了如何为同一段落设置定义不同的样式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试转换 html 文本以生成单词表.它工作得很好,创建的word文件是正确的,除了字符样式.

这是我第一次尝试使用 Apache POI.

到目前为止,我能够从文本段落中检测到换行 (<br>) 标签(请参阅下面的代码).但我还想检查一些其他标签,例如 <b>、<li>、<font>并为每个零件设置正确的运行值.

例如:
这是我的文字 <i>现在是斜体b但也是粗体</b>取决于其重要性

我想我应该解析文本,并对每个部分应用不同的运行,但我不知道该怎么做.

private static XWPFParagraph getTableParagraph(XWPFTableCell cell, String text){INT字体大小= 11;XWPFParagraph 段落 = cell.addParagraph();cell.removeParagraph(0);段落.setSpacingAfterLines(0);段落.setSpacingAfter(0);XWPFRun myRun1 =paragraph.createRun();if (text==null) text="";别的{而(真){int x = text.indexOf("
");如果 (x <0) 中断;String work = text.substring(0,x);text= text.substring(x+4);myRun1.setText(work);myRun1.addBreak();}}myRun1.setText(text);myRun1.setFontSize(fontsize);返回段落;}

解决方案

在转换 HTML 文本时,永远不应该只使用字符串方法继续 HTML.XMLHTML 都是标记语言.它们的内容是标记,而不仅仅是纯文本.需要遍历标记以获取所有单个节点及其含义.这个遍历过程从来都不是微不足道的,所以有特殊的库.在这些库的深处也需要使用字符串方法,但这些方法被包装成用于遍历标记的有用方法.

为了遍历HTML,例如可以使用

免责声明:这是一份展示该原则的工作草案.它既没有完全准备好,也没有准备好用于生产环境的代码.

I'm trying to convert html text to generate a word table. It works pretty well, and the created word file is correct, except the character styles.

This is my first try with Apache POI.

So far, I was able to detect new line (<br>) tags from text paragraph (see code below). But I'd like to also check a few other tags such as <b>, <li>, <font> and set the right run values for each part.

For example :
This is my text <i> which now is in italic<b> but also in bold</b> depending on its importance</i>

I gess I should parse the text, and apply different runs for each part, but I don't know how to do.

private static  XWPFParagraph getTableParagraph(XWPFTableCell  cell,  String text)
{   
    int fontsize= 11; 
    XWPFParagraph paragraph = cell.addParagraph();
    cell.removeParagraph(0);
    paragraph.setSpacingAfterLines(0);
    paragraph.setSpacingAfter(0);
    XWPFRun myRun1 = paragraph.createRun();
    if (text==null) text="";
    else
    {
        while (true)
        {
            int x = text.indexOf("<br>"); 
            if (x <0) break;
            String work = text.substring(0,x );
            text= text.substring(x+4);
            myRun1.setText(work);
            myRun1.addBreak();
        }
    }

    myRun1.setText(text);
    myRun1.setFontSize(fontsize);
    return paragraph;
}

解决方案

While converting HTML text one never should go on the HTML using string methods only. XML as well as HTML are markup languages. Their content is markup and not only plain text. The markup needs to be traversed to get all the single nodes together with the meanings out of it. This traversing process never is trivial and so special libraries are there for. Deep inside those libraries also needs using string methods but those are wrapped into useful methods for traversing the markup.

For traversing HTML jsoup may be used for example. Especially NodeTraversor using NodeVisitor is useful for traversing HTML.

My example creates a ParagraphNodeVisitor which implements NodeVisitor. This interface requests method public void head(Node node, int depth) which is called every time the NodeTraversor is on head of a node and public void tail(Node node, int depth) which is called every time the NodeTraversor is on tail of a node. In those methods the process for handling the single nodes can be implemented. In our case main part of the process is whether we need a new XWPFRun and what settings this run needs.

Example:

import java.io.FileOutputStream;

import org.apache.poi.xwpf.usermodel.*;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Node;
import org.jsoup.nodes.TextNode;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.jsoup.select.NodeVisitor;
import org.jsoup.select.NodeTraversor;

public class HTMLtoDOCX {

 private XWPFDocument document;

 public HTMLtoDOCX(String html, String docxPath) throws Exception {

  this.document = new XWPFDocument();

  XWPFParagraph paragraph = null;
  Document htmlDocument = Jsoup.parse(html);
  Elements htmlParagraphs = htmlDocument.select("p");
  for(Element htmlParagraph : htmlParagraphs) {

System.out.println(htmlParagraph);

   paragraph = document.createParagraph();
   createParagraphFromHTML(paragraph, htmlParagraph);
  }

  FileOutputStream out = new FileOutputStream(docxPath);
  document.write(out);
  out.close();
  document.close();

 }

 void createParagraphFromHTML(XWPFParagraph paragraph, Element htmlParagraph) {

  ParagraphNodeVisitor nodeVisitor = new ParagraphNodeVisitor(paragraph);
  NodeTraversor.traverse(nodeVisitor, htmlParagraph);
 
 }

 private class ParagraphNodeVisitor implements NodeVisitor {

  String nodeName;
  boolean needNewRun;
  boolean isItalic;
  boolean isBold;
  boolean isUnderlined;
  int fontSize;
  String fontColor;
  XWPFParagraph paragraph;
  XWPFRun run;

  ParagraphNodeVisitor(XWPFParagraph paragraph) {
   this.paragraph = paragraph;
   this.run = paragraph.createRun();
   this.nodeName = "";
   this.needNewRun = false;
   this.isItalic = false;
   this.isBold = false;
   this.isUnderlined = false;
   this.fontSize = 11;
   this.fontColor = "000000";

  }

  @Override
  public void head(Node node, int depth) {
   nodeName = node.nodeName();

System.out.println("Start "+nodeName+": " + node);

   needNewRun = false;
   if ("#text".equals(nodeName)) {
    run.setText(((TextNode)node).text());
    needNewRun = true; //after setting the text in the run a new run is needed
   } else if ("i".equals(nodeName)) {
    isItalic = true;
   } else if ("b".equals(nodeName)) {
    isBold = true;
   } else if ("u".equals(nodeName)) {
    isUnderlined = true;
   } else if ("br".equals(nodeName)) {
    run.addBreak();
   } else if ("font".equals(nodeName)) {
    fontColor = (!"".equals(node.attr("color")))?node.attr("color").substring(1):"000000";
    fontSize = (!"".equals(node.attr("size")))?Integer.parseInt(node.attr("size")):11;
   } 
   if (needNewRun) run = paragraph.createRun();
   needNewRun = false;
   run.setItalic(isItalic);
   run.setBold(isBold);
   if (isUnderlined) run.setUnderline(UnderlinePatterns.SINGLE); else run.setUnderline(UnderlinePatterns.NONE);
   run.setColor(fontColor); run.setFontSize(fontSize);
  }

  @Override
  public void tail(Node node, int depth) {
   nodeName = node.nodeName();

System.out.println("End "+nodeName);

   if ("i".equals(nodeName)) {
    isItalic = false;
   } else if ("b".equals(nodeName)) {
    isBold = false;
   } else if ("u".equals(nodeName)) {
    isUnderlined = false;
   } else if ("font".equals(nodeName)) {
    fontColor = "000000";
    fontSize = 11;
   }
   if (needNewRun) run = paragraph.createRun();
   needNewRun = false;
   run.setItalic(isItalic);
   run.setBold(isBold);
   if (isUnderlined) run.setUnderline(UnderlinePatterns.SINGLE); else run.setUnderline(UnderlinePatterns.NONE);
   run.setColor(fontColor); run.setFontSize(fontSize);
  }
 }

 public static void main(String[] args) throws Exception {

  String html = 
   "<p><font size='32' color='#0000FF'><b>First paragraph.</font></b><br/>Just like a heading</p>"
  +"<p>This is my text <i>which now is in italic <b>but also in bold</b> depending on its <u>importance</u></i>.<br/>Now a <b><i><u>new</u></i></b> line starts <i>within <b>the same</b> paragraph</i>.</p>"
  +"<p><b>Last <u>paragraph <i>comes</u> here</b> finally</i>.</p>"
  +"<p>But yet <u><i><b>another</i></u></b> paragraph having <i><font size='22' color='#FF0000'>special <u>font</u> settings</font></i>. Now default font again.</p>";

  HTMLtoDOCX htmlToDOCX = new HTMLtoDOCX(html, "./CreateWordParagraphFromHTML.docx");

 }
}

Result:

Disclaimer: This is a working draft showing the principle. Neither it is fully ready nor it is code ready for use in productive environments.

这篇关于如何为同一段落设置定义不同的样式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆