在Java中将换行符和段落符转换为换行符 [英] convert breaks and paragraph breaks into new line in java
问题描述
基本上,我有一个HTML片段,其中包含<br>
和<p></p>
.我能够删除所有HTML标记,但是这样做会使文本格式错误.
Basically I have an HTML fragment with <br>
and <p></p>
inside. I was able to remove all the HTML tags but doing so leaves the text in a bad format.
我想要类似PHP中 nl2br()
的东西,除了反向输入和输出,并且还考虑了<p>
标签.是否有Java库?
I want something like nl2br()
in PHP except reverse the input and output and also takes into account <p>
tags. is there a library for it in Java?
推荐答案
您基本上需要将每个<br>
替换为\n
,将每个<p>
替换为\n\n
.因此,在成功删除它们的点上,需要分别插入\n
和\n\n
.
You basically need to replace each <br>
with \n
and each <p>
with \n\n
. So, at the points where you succeed to remove them, you need to insert the \n
and \n\n
respectively.
以下是在 Jsoup HTML解析器的帮助下的启动示例(HTML示例是故意这样编写的,很难(即使不是几乎不可能)使用正则表达式.)
Here's a kickoff example with help of the Jsoup HTML parser (the HTML example is intentionally written that way so that it's hard if not nearly impossible to use regex for this).
public static void main(String[] args) throws Exception {
String originalHtml = "<p>p1l1<br/><!--</p>-->p1l2<br><!--<p>--></br><p id=p>p2l1<br class=b>p2l2</p>";
String text = br2nl(originalHtml);
String newHtml = nl2br(text);
System.out.println("-------------");
System.out.println(text);
System.out.println("-------------");
System.out.println(newHtml);
}
public static String br2nl(String html) {
Document document = Jsoup.parse(html);
document.select("br").append("\\n");
document.select("p").prepend("\\n\\n");
return document.text().replace("\\n", "\n");
}
public static String nl2br(String text) {
return text.replace("\n\n", "<p>").replace("\n", "<br>");
}
(注意:replaceAll()
是不必要的,因为我们只想在这里进行简单的逐字符替换,而不是逐字符逐行替换)
(note: replaceAll()
is unnecessary as we just want a simple charsequence-by-charsequence replacement here, not regexpattern-by-charsequence replacement)
输出:
<p>p1l1<br/><!--</p>-->p1l2<br><!--<p>--></br><p id=p>p2l1<br class=b>p2l2</p>
-------------
p1l1
p1l2
p2l1
p2l2
-------------
<p>p1l1 <br>p1l2 <br> <br> <p>p2l1 <br>p2l2
有点hacky,但是可以用.
A bit hacky, but it works.
这篇关于在Java中将换行符和段落符转换为换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!