JSOUP解析HTML进去类类 [英] JSOUP parsing HTML get class inside class

查看:208
本文介绍了JSOUP解析HTML进去类类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发使用JSOUP分析HTML Android应用程序。

i am developing android application using JSOUP for parsing HTML.

我有HTML语法

    <div class='wrapper'>   
<div style='margin:7px;'>
    <div class='box' style='height:595px'>
        <div class='boxtitlebox'>
            <div class='boxtitle'><h4>13 RECENT CHORDS</h4></div><div class='clear'></div>
        </div>

        <div class='listitem'><a href='http://www.chordfrenzy.com/chord/9742/ungu-apa-sih-maumu-kord-lirik-lagu'>           
            <div class='subtitle'>Chord Ungu</div>
            <div class='title'>Apa Sih Maumu</div>
        </a></div>
        <div class='listitem'><a href='http://www.chordfrenzy.com/chord/6826/slank-boneka-tersayang-kord-lirik-lagu'>           
            <div class='subtitle'>Chord Slank</div>
            <div class='title'>Boneka Tersayang</div>
        </a></div>
        <div class='listitem'><a href='http://www.chordfrenzy.com/chord/6751/ari-lasso-rayuan-gombal-kord-lirik-lagu'>          
            <div class='subtitle'>Chord Ari Lasso</div>
            <div class='title'>Rayuan Gombal</div>
        </a></div>
        </div>
</div>
 </div>

现在,我很迷惑我怎样才能得到每个 ahref 字幕标题上面?

Now, i am confuse how can i get each ahref, subtitle and title above?

我需要它来填补我的数组像这样

i need it to fill my array like this

String[] link=["http://www.chordfrenzy.com/chord/9742/ungu-apa-sih-maumu-kord-lirik-lagu","http://www.chordfrenzy.com/chord/6826/slank-boneka-tersayang-kord-lirik-lagu","http://www.chordfrenzy.com/chord/6751/ari-lasso-rayuan-gombal-kord-lirik-lagu"];
String[] subtitile=["Chord Ungu","Chord Slank","Chord Ari Lasso"];
String[] title=["Apa Sih Maumu","Boneka Tersayang","Rayuan Gombal"];

任何IDE?

any ide?

推荐答案

一般来说,你应该preFER中的选择的API 而不是DOM( getElementsByX

In general you should prefer the Selector API instead of DOM (getElementsByX)

下面是一个例子:

Document doc = Jsoup.parse(html);


// Links
List<String> links = new ArrayList<>();

for( Element element : doc.select("a[href]") )
{
    links.add(element.attr("href"));
}


// Subtitles
List<String> subtitles = new ArrayList<>();

for( Element element : doc.select("div[class=subtitle]") )
{
    subtitles.add(element.text());
}


// Titles
List<String> titles = new ArrayList<>();

for( Element element : doc.select("div[class=title]") )
{
    titles.add(element.text());
}

元素被标记和属性时,如果标签不同,或者是不相关的,你可以删除它们(如: [CLASS =标题] 而不是 DIV [CLASS =标题] )。看看选择器API(以上链接)的更多的窍门。

Elements are selected by tag and attribute, if the tags differ or are not relevant you can remove them (eg. [class=title] instead of div[class=title]). Take a look at the Selector API (link above) for some more tipps.

这篇关于JSOUP解析HTML进去类类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆