JSOUP解析HTML进去类类 [英] JSOUP parsing HTML get class inside class
问题描述
我正在开发使用JSOUP分析HTML Android应用程序。
i am developing android application using JSOUP for parsing HTML.
我有HTML语法
<div class='wrapper'>
<div style='margin:7px;'>
<div class='box' style='height:595px'>
<div class='boxtitlebox'>
<div class='boxtitle'><h4>13 RECENT CHORDS</h4></div><div class='clear'></div>
</div>
<div class='listitem'><a href='http://www.chordfrenzy.com/chord/9742/ungu-apa-sih-maumu-kord-lirik-lagu'>
<div class='subtitle'>Chord Ungu</div>
<div class='title'>Apa Sih Maumu</div>
</a></div>
<div class='listitem'><a href='http://www.chordfrenzy.com/chord/6826/slank-boneka-tersayang-kord-lirik-lagu'>
<div class='subtitle'>Chord Slank</div>
<div class='title'>Boneka Tersayang</div>
</a></div>
<div class='listitem'><a href='http://www.chordfrenzy.com/chord/6751/ari-lasso-rayuan-gombal-kord-lirik-lagu'>
<div class='subtitle'>Chord Ari Lasso</div>
<div class='title'>Rayuan Gombal</div>
</a></div>
</div>
</div>
</div>
现在,我很迷惑我怎样才能得到每个 ahref 字幕和标题上面?
Now, i am confuse how can i get each ahref, subtitle and title above?
我需要它来填补我的数组像这样
i need it to fill my array like this
String[] link=["http://www.chordfrenzy.com/chord/9742/ungu-apa-sih-maumu-kord-lirik-lagu","http://www.chordfrenzy.com/chord/6826/slank-boneka-tersayang-kord-lirik-lagu","http://www.chordfrenzy.com/chord/6751/ari-lasso-rayuan-gombal-kord-lirik-lagu"];
String[] subtitile=["Chord Ungu","Chord Slank","Chord Ari Lasso"];
String[] title=["Apa Sih Maumu","Boneka Tersayang","Rayuan Gombal"];
任何IDE?
any ide?
推荐答案
一般来说,你应该preFER中的选择的API 而不是DOM( getElementsByX
)
In general you should prefer the Selector API instead of DOM (getElementsByX
)
下面是一个例子:
Document doc = Jsoup.parse(html);
// Links
List<String> links = new ArrayList<>();
for( Element element : doc.select("a[href]") )
{
links.add(element.attr("href"));
}
// Subtitles
List<String> subtitles = new ArrayList<>();
for( Element element : doc.select("div[class=subtitle]") )
{
subtitles.add(element.text());
}
// Titles
List<String> titles = new ArrayList<>();
for( Element element : doc.select("div[class=title]") )
{
titles.add(element.text());
}
元素被标记和属性时,如果标签不同,或者是不相关的,你可以删除它们(如: [CLASS =标题]
而不是 DIV [CLASS =标题]
)。看看选择器API(以上链接)的更多的窍门。
Elements are selected by tag and attribute, if the tags differ or are not relevant you can remove them (eg. [class=title]
instead of div[class=title]
). Take a look at the Selector API (link above) for some more tipps.
这篇关于JSOUP解析HTML进去类类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!