jSoup从< span>获取文本班级 [英] jSoup to get text from <span> class

查看:92
本文介绍了jSoup从< span>获取文本班级的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一部分HTML文件,格式如下:

I have a part of the HTML file with the following format:

<h6 class="uiStreamMessage" data-ft="_____"> 
   <span class="messageBody" data-ft="____"> Welcome
   </span>
</h6>

在文件中,还有其他跨度类.但是我只想获取所有"messageBody"跨度的文本,该文本将插入数据库中.

In the file, there are other span classes. But I would like to get the text for ALL 'messageBody' span only, which will be inserted into the database.

我尝试过:

Elements links = doc.select("span.messageBody");
for (Element link : links) {
     message = link.text();
     // codes to insert into DB
}

甚至

Elements links = doc.select("h6.uiStreamMessage span.messageBody");

两者都不起作用.我找不到其他地方的解决方案. 请帮忙.

Both doesn't work. I couldn't find any solutions from elsewhere. Please kindly help.

**编辑

我已经意识到这是html文件中的嵌套范围:

I've realised it's a nested span within the html file:

<h6 class="uiStreamMessage" data-ft=""> 
   <span class="messageBody" data-ft="">Twisted<a href="http://"><span>http://</span>
   <span class="word_break"></span>www.tb.net/</a> Balloons
   </span>
</h6>

它只是有时在"messageBody"跨度内还有另一个跨度.如何获得"messageBody"范围内的所有文本?

And it's only at times there is another span within the 'messageBody' span. How do I get ALL the text within the 'messageBody' span?

推荐答案

 String html = "<h6 class='uiStreamMessage' data-ft=''><span class='messageBody' data-ft=''>Twisted<a href='http://'><span>http://</span><span class='word_break'></span>www.tb.net/</a> Balloons</span></h6>";
 Document doc = Jsoup.parse(html);
 Elements elements = doc.select("h6.uiStreamMessage > span.messageBody");
 for (Element e : elements) {
      System.out.println("All text:" + e.text());
      System.out.println("Only messageBody text:" + e.ownText());
}

对于Facebook页面 https://www.facebook.com/pages/The-Nanyang-Chronicle/141387533074 :

For the facebook page https://www.facebook.com/pages/The-Nanyang-Chronicle/141387533074:

try {
        Document doc = Jsoup.connect("https://www.facebook.com/pages/The-Nanyang-Chronicle/141387533074").timeout(0).get();

        Elements elements = doc.select("code.hidden_elem");
        for (Element e : elements) {
            String eHtml = e.html().replace("<!--", "").replace("-->", "");
            Document eWIthoutComment = Jsoup.parse(eHtml);
            Elements elem = eWIthoutComment.select("h6.uiStreamMessage >span.messageBody");
            for (Element eb : elem) {
                System.out.println(eb.text());                   
            }
        }
    } catch (IOException ex) {
        System.err.println("Error:" + ex.getMessage());
    }

这篇关于jSoup从&lt; span&gt;获取文本班级的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆