如何在j2ee中使用换行符替换某些标签并删除其他标签 [英] How to replace some tags and remove others with line break in j2ee

查看:175
本文介绍了如何在j2ee中使用换行符替换某些标签并删除其他标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

主要问题是获取html文件的内容并删除所有标签。

我之前读过这些问题:

The main problem was to get the content of an html file and remove all tags.
I have read theses questions before:

< a href =https://stackoverflow.com/questions/5640334/how-do-i-preserve-line-breaks-when-using-jsoup-to-convert-html-to-plain-text> 1 , 2 3

阅读所有这些内容后我决定使用 jsoup ,这对我有帮助。我还意识到如何保持换行并用换行符替换< p> 标签。

现在我的问题是我有一个html文件有一个< H1> 标签,其中整个内容的标题可用,我想保留一个换行符但是jsoup第一段恰好在标题之后没有任何换行符。任何人都可以帮助我PLZ?

我的html代码:

after reading all of them I decided to use jsoup and it really helped me. I also realized how to keep line break and replace <p> tags with line break.
now my problem is that I have an html file which has a <H1> tag inside which the title of the whole content is available and I wanna keep it with a line break but with jsoup the fist paragraph comes exactly after the title without any line break. can any one help me plz?
the html code I have :

< DIV class =story-headline >

< H1 class =story-title> NFL 2014预测< / H1> < br>
< / DIV>

< H3 class =story-deck>我们的挑选季后赛球队,惊喜,超级碗< / H3>

< P class =small lighttext>

< SPAN class =delimited>发布日期:2014年9月2日东部时间下午1:30< / SPAN>

< SPAN>最后更新时间:2014年9月4日上午10:27< / SPAN>

< ; / P>

,输出为:

NFL 2014 predictionsOur picks for playoff teams, surprises, Super Bowl

Posted: Sep 02, 2014 1:30 PM ETLast Updated: Sep 04, 2014 10:27 AM ET  

我希望它是:

NFL 2014 predictions  
Our picks for playoff teams, surprises, Super Bowl  
Posted: Sep 02, 2014 1:30 PM ET  
Last Updated: Sep 04, 2014 10:27 AM ET 


推荐答案

您应该挂钩目标文档的 OutputSettings ,请尝试以下操作:

You should hook the OutputSettings of the target Document, so try the following:

public class HtmlWithLineBreaks 
{

  public String getCleanHtml(Document document)
  {
    document.outputSettings(new Document.OutputSettings().prettyPrint(false)); //makes html() call preserve linebreaks and spacing
    return Jsoup.clean(document.html(),
        "",
        Whitelist.none(),
        new Document.OutputSettings().prettyPrint(false));
  }

  public static void main(String... args)
  {
    File input = new File("/path/to/some/input.html"); //Just replace the input with you own html file source
    Document document;
    try
    {
      document = Jsoup.parse(input, "UTF-8");
      String printOut = new HtmlWithLineBreaks().getCleanHtml(document);
      System.out.println(printOut);
    } catch (IOException e)
    {
      e.printStackTrace();
    } 
  }

}

(可选)您可以在< h1> < div> 包装后插入自定义换行符对提供的输出不满意:

Optionally you can insert custom linebreaks after your <h1> <div> wrapper if you are not satisfied with the provided output:

public String getCleanHtml(Document document)
{
  document.outputSettings(new Document.OutputSettings().prettyPrint(false));
  document.select("h1").parents().select("div").append("\n"); // Insert a linebreak after the h1 div parent.
  return Jsoup.clean(document.html(),
      "",
      Whitelist.none(),
      new Document.OutputSettings().prettyPrint(false));
}

这篇关于如何在j2ee中使用换行符替换某些标签并删除其他标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆