在Java中将换行符和段落符转换为换行符 [英] convert breaks and paragraph breaks into new line in java

查看:152
本文介绍了在Java中将换行符和段落符转换为换行符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,我有一个HTML片段,其中包含<br><p></p>.我能够删除所有HTML标记,但是这样做会使文本格式错误.

Basically I have an HTML fragment with <br> and <p></p> inside. I was able to remove all the HTML tags but doing so leaves the text in a bad format.

我想要类似PHP中 nl2br() 的东西,除了反向输入和输出,并且还考虑了<p>标签.是否有Java库?

I want something like nl2br() in PHP except reverse the input and output and also takes into account <p> tags. is there a library for it in Java?

推荐答案

您基本上需要将每个<br>替换为\n,将每个<p>替换为\n\n.因此,在成功删除它们的点上,需要分别插入\n\n\n.

You basically need to replace each <br> with \n and each <p> with \n\n. So, at the points where you succeed to remove them, you need to insert the \n and \n\n respectively.

以下是在 Jsoup HTML解析器的帮助下的启动示例(HTML示例是故意这样编写的,很难(即使不是几乎不可能)使用正则表达式.)

Here's a kickoff example with help of the Jsoup HTML parser (the HTML example is intentionally written that way so that it's hard if not nearly impossible to use regex for this).

public static void main(String[] args) throws Exception {
    String originalHtml = "<p>p1l1<br/><!--</p>-->p1l2<br><!--<p>--></br><p id=p>p2l1<br class=b>p2l2</p>";
    String text = br2nl(originalHtml);
    String newHtml = nl2br(text);

    System.out.println("-------------");
    System.out.println(text);
    System.out.println("-------------");
    System.out.println(newHtml);
}

public static String br2nl(String html) {
    Document document = Jsoup.parse(html);
    document.select("br").append("\\n");
    document.select("p").prepend("\\n\\n");
    return document.text().replace("\\n", "\n");
}

public static String nl2br(String text) {
    return text.replace("\n\n", "<p>").replace("\n", "<br>");
}

(注意:replaceAll()是不必要的,因为我们只想在这里进行简单的逐字符替换,而不是逐字符逐行替换)

(note: replaceAll() is unnecessary as we just want a simple charsequence-by-charsequence replacement here, not regexpattern-by-charsequence replacement)

输出:

<p>p1l1<br/><!--</p>-->p1l2<br><!--<p>--></br><p id=p>p2l1<br class=b>p2l2</p>
-------------


p1l1 
p1l2 



p2l1 
p2l2
-------------
<p>p1l1 <br>p1l2 <br> <br> <p>p2l1 <br>p2l2

有点hacky,但是可以用.

A bit hacky, but it works.

这篇关于在Java中将换行符和段落符转换为换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆