在Android中使用Jsoup选择组合HTML解析 [英] Html parsing using Jsoup selector combinations in Android

查看:202
本文介绍了在Android中使用Jsoup选择组合HTML解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要解析< D​​T>播种机:LT; / DT> &安培; < D​​T> Leechers:LT; / DT> 使用Jsoup一个HTML。
查看完整code以下。

 < D​​IV ID =细节>
    < D​​L类=COL1>
        < D​​T>类型:LT; / DT>
        < D​​D>< A HREF =/浏览/ 101标题=更多来自这一类>音频&放大器; GT;音乐< / A>< / DD>        < D​​T>文件:LT; / DT>
                < D​​D>< A HREF =/洪流/ 8682317 /称号=文件的onclick =
如果(文件清单和放大器;。1){
        新的Ajax.Updater('filelistContainer','/ajax_details_filelist.php',{方法:'得到',参数:'ID = 8682317'});
        文件清单= 1;
}; toggleFilelist();返回false;> 28和; / A>< / DD>        < D​​T>尺寸:其中; / DT>
        < D​​D> 222.65&安培; NBSP;&MIB放大器; NBSP;(233468815&安培; NBSP;字节)LT; / DD>
        < BR />                    < D​​T>标签(S):其中; / DT>
            < D​​D>< A HREF =/标签/马库斯>马库斯< / A> < A HREF =/标签/舒尔茨>舒尔茨< / A> < A HREF =/标签/达科他>&达科LT; / A> < A HREF =/标签/事>&东西LT; / A> < A HREF =/标签/恍惚>&恍惚LT; / A> < A HREF =/标签/舰队>&舰队LT; / A> < A HREF =/标签/ 2011> 2011< / A> < A HREF =/标签/灵>的Inspiron< / A> < / DD>
                < BR />
        < D​​T>上传:LT; / DT>
        < D​​D> 2013年7月13日15:30:25 GMT< / DD>
        < D​​T>按:其中; / DT>
        < D​​D>
        < A HREF =/用户/ -inspiron- /称号=浏览-inspiron - > -inspiron-< / A>&安培; NBSP;< IMG SRC =/静态/ IMG / vip.gif ALT =贵宾称号=VIP的风格=宽度:11像素;边界=0/>< / DD>
        < BR />        < D​​T>播种机:LT; / DT>
        < D​​D> 16 LT; / DD>        < D​​T> Leechers:LT; / DT>
        < D​​D> 1 LT; / DD>        < D​​T>意见< / DT>
        &所述峰; dd>&下;跨度的id =NumComments大于0&下; /跨度>
                &安培; NBSP;
                < / DD>        < BR />
        < D​​T>信息哈希:LT; / DT>< D​​D>&安培; NBSP;< / DD>
        01DD6B7325C3DB5F0DF5BBE510FD3FD9738D1C88< / DL>
< D​​IV CLASS =torpicture>
< IMG SRC =// image.bayimg.com/345b5b11734bb9973863359cc52929f3ddc45205.jpg称号=图片ALT =图片/>
< / DIV>
    < D​​L类=COL2>
    < / DL>    < D​​IV ID =CommentDiv的风格=显示:无;>
        <形式方法=邮报ID =commentsformNAME =commentsform的onsubmit =新的Ajax.Updater('NumComments','/ajax_post_comment.php',{evalScripts:真实的,异步的:真实,参数:表。序列化(这)});返回false;行动=/ ajax_post_comment.php>
            < p =类信息>
                < textarea的名字=add_commentID =add_comment行=8COLS =50>< / textarea的>< BR />
                <输入类型=隐藏的名字=ID值=8682317/>
                <输入类型=提交值=提交/><输入类型=按钮值=隐藏的onclick =的document.getElementById('CommentDiv')的style.display ='无'/ >
            &所述; / P>
        < /表及GT;
    < / DIV>
        < BR />
        < BR />
< D​​IV ID =社交>
< / DIV>         < IFRAME src=\"http://cdn1.adexprt.com/dl/dl.php?b=bar&r=75&n=Markus_Schulz_-_Global_DJ_Broadcast_%282013-07-11%29_%28Inspiron%29&m=magnet%3A%3Fxt%3Durn%3Abtih%3A01dd6b7325c3db5f0df5bbe510fd3fd9738d1c88%26dn%3DMarkus%2BSchulz%2B-%2BGlobal%2BDJ%2BBroadcast%2B%25282013-07-11%2529%2B%2528Inspiron%2529%26tr%3Dudp%253A%252F%252Ftracker.openbittorrent.com%253A80%26tr%3Dudp%253A%252F%252Ftracker.publicbt.com%253A80%26tr%3Dudp%253A%252F%252Ftracker.istole.it%253A6969%26tr%3Dudp%253A%252F%252Ftracker.ccc.de%253A80%26tr%3Dudp%253A%252F%252Fopen.demonii.com%253A1337\" WIDTH =622HEIGHT =51FRAMEBORDER =0SCROLLING =NO>< / IFRAME>
    < BR />< BR /> < D​​IV CLASS =下载>
            <风格=背景图片:URL(/静态/ IMG /图标/图标magnet.gif); href=\"magnet:?xt=urn:btih:01dd6b7325c3db5f0df5bbe510fd3fd9738d1c88&dn=Markus+Schulz+-+Global+DJ+Broadcast+%282013-07-11%29+%28Inspiron%29&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.istole.it%3A6969&tr=udp%3A%2F%2Ftracker.ccc.de%3A80&tr=udp%3A%2F%2Fopen.demonii.com%3A1337\"标题=获得此洪流>&安培; NBSP;得到这个洪流< / A>                <风格=背景图片:URL(/静态/ IMG /图标https.gif); href=\"http://adexprt.me/get/Markus_Schulz_-_Global_DJ_Broadcast_%282013-07-11%29_%28Inspiron%29?tag=bal\"标题=匿名下载>&安培; NBSP;匿名下载< / A>
    < / DIV>
        < D​​IV>(用磁铁链接的问题通过升级&LT固定; A HREF =htt​​p://www.bitlordapp.com/d/btl1/?sr=irm&chnl=details目标=_空白> BT客户端< / A>!)LT; / DIV>    < D​​IV CLASS =NFO>
< pre> =========================================== ============
网站:http://www.inspirontrance.com/
================================================== =====
================================================== =====
˚FB页数:灵恍惚
================================================== =====
================================================== =====
TWITTER:inspiron22
================================================== =====
马库斯·舒尔茨
01.美孚 - 一个早晨(阿列克谢Sladkov混音)
02.存储n前进 - 坚果
03.阿尔特未来与霍尔布鲁克和放大器;#38; SkyKeeper - Hostel Center旅馆
04.达尼罗Ercole的 - 的Cruzer
05.亚伦Camz - 发射
06.马库斯·舒尔茨拥有莎拉·豪厄尔斯 - 诱惑
07. M.I.K.E. presents Caromax - 内蒙古的思考
08. Ruffault - 逐行梦
09. Styller - 我们掉队
10.经络 - 退出
11.朗格 - 疯狂的以不同的色调
12. Tucandeo拥有娜塔莉焦亚 - 消失(Xtigma混音)
13.塞巴斯蒂安Weikum - 天空才是极限
14.马库斯·舒尔茨 - 唐&安培;#39;吨离开,直到日出盖伊Ĵ
01.罗杰·马丁内斯和放大器;#38;揭秘电影 - 薄荷拉格(盖伊Ĵ混音)
02.大使 - 淡入淡出(盖伊Ĵ混音)
03.盖伊的J - 七
04. Echomen&放大器;#9516;&放大器;#251;永久(盖伊Ĵ混音)返回与马库斯·舒尔茨
15.毛罗Picotto&放大器;#38;里卡尔多·费里 - 新时代,新的地方(新世界放心使用混音)
16.沙蚕和放大器;#38;霍夫斯皮安 - 狡诈
17. Nifra - 波
18.马库斯·舒尔茨具有粘性的 - 完美(数字化X混音)[全球选择]
19.罗勒的O'胶 - 吉尔伽美什
20. Skytech - 彼端
21. ID
享受
(灵)LT; / pre>
    < / DIV>

我用这个code,它解析整个细节​​,而不是解析'播种机'和; leechers

  {尝试
                文件= Jsoup.connect(BLOG_URL)获得();
                标题= document.title时();
            }赶上(IOException异常五){
                // TODO自动生成catch块
                e.printStackTrace();
            }
            //选择查询
            元素nodeBlogStats = document.select(#DIV范围);
            //检查结果
            如果(nodeBlogStats.size()大于0){
                //获得价值
                结果= nodeBlogStats.get(0)的.text();
            }


解决方案

根据 http://jsoup.org/apidocs/org/jsoup/select/Selector.html ,您正在寻找


  

电子〜F 的F元素$ P $由同级Ëpceded



  

:包含(文本)元素包含指定文本


我会尝试

 元素播种机= document.select(DT:包含(播种机)〜DD)得到(0);
元素leechers = document.select(DT:包含(Leechers)〜DD)得到(0);

I want to parse <dt>Seeders:</dt> & <dt>Leechers:</dt> from a html using Jsoup. See the full code below.

<div id="details">
    <dl class="col1">
        <dt>Type:</dt>
        <dd><a href="/browse/101" title="More from this category">Audio &gt; Music</a></dd>

        <dt>Files:</dt>
                <dd><a href="/torrent/8682317/" title="Files" onclick="
if (filelist &lt; 1) {
        new Ajax.Updater('filelistContainer', '/ajax_details_filelist.php', {method: 'get', parameters: 'id=8682317'});
        filelist=1;
}; toggleFilelist(); return false;">28</a></dd>

        <dt>Size:</dt>
        <dd>222.65&nbsp;MiB&nbsp;(233468815&nbsp;Bytes)</dd>
        <br />



                    <dt>Tag(s):</dt>
            <dd><a href="/tag/markus">markus</a> <a href="/tag/schulz">schulz</a> <a href="/tag/dakota">dakota</a> <a href="/tag/things">things</a> <a href="/tag/trance">trance</a> <a href="/tag/armada">armada</a> <a href="/tag/2011">2011</a> <a href="/tag/inspiron">inspiron</a> </dd>
                <br />
        <dt>Uploaded:</dt>
        <dd>2013-07-13 15:30:25 GMT</dd>
        <dt>By:</dt>
        <dd>
        <a href="/user/-inspiron-/" title="Browse -inspiron-">-inspiron-</a>&nbsp;<img src="/static/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border='0' /></dd>
        <br />

        <dt>Seeders:</dt>
        <dd>16</dd>

        <dt>Leechers:</dt>
        <dd>1</dd>

        <dt>Comments</dt>
        <dd><span id="NumComments">0</span>
                &nbsp;
                </dd>

        <br />
        <dt>Info Hash:</dt><dd>&nbsp;</dd>
        01DD6B7325C3DB5F0DF5BBE510FD3FD9738D1C88    </dl>
<div class="torpicture">
<img src="//image.bayimg.com/345b5b11734bb9973863359cc52929f3ddc45205.jpg" title="picture" alt="picture" />
</div>
    <dl class="col2">
    </dl>

    <div id="CommentDiv" style="display:none;">
        <form method="post" id="commentsform" name="commentsform" onsubmit="new Ajax.Updater('NumComments', '/ajax_post_comment.php', {evalScripts:true, asynchronous:true, parameters:Form.serialize(this)}); return false;" action="/ajax_post_comment.php">
            <p class="info">
                <textarea name="add_comment" id="add_comment" rows="8" cols="50"></textarea><br/>
                <input type="hidden" name="id" value="8682317"/>
                <input type="submit" value="Submit" /><input type="button" value="Hide" onclick="document.getElementById('CommentDiv').style.display = 'none'" />
            </p>
        </form>
    </div>
        <br/>
        <br/>
<div id="social">
</div>

         <iframe src="http://cdn1.adexprt.com/dl/dl.php?b=bar&r=75&n=Markus_Schulz_-_Global_DJ_Broadcast_%282013-07-11%29_%28Inspiron%29&m=magnet%3A%3Fxt%3Durn%3Abtih%3A01dd6b7325c3db5f0df5bbe510fd3fd9738d1c88%26dn%3DMarkus%2BSchulz%2B-%2BGlobal%2BDJ%2BBroadcast%2B%25282013-07-11%2529%2B%2528Inspiron%2529%26tr%3Dudp%253A%252F%252Ftracker.openbittorrent.com%253A80%26tr%3Dudp%253A%252F%252Ftracker.publicbt.com%253A80%26tr%3Dudp%253A%252F%252Ftracker.istole.it%253A6969%26tr%3Dudp%253A%252F%252Ftracker.ccc.de%253A80%26tr%3Dudp%253A%252F%252Fopen.demonii.com%253A1337" width="622" height="51" frameborder="0" scrolling="no"></iframe>
    <br /><br />    <div class="download">
            <a style='background-image: url("/static/img/icons/icon-magnet.gif");' href="magnet:?xt=urn:btih:01dd6b7325c3db5f0df5bbe510fd3fd9738d1c88&dn=Markus+Schulz+-+Global+DJ+Broadcast+%282013-07-11%29+%28Inspiron%29&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.istole.it%3A6969&tr=udp%3A%2F%2Ftracker.ccc.de%3A80&tr=udp%3A%2F%2Fopen.demonii.com%3A1337" title="Get this torrent">&nbsp;Get this torrent</a> 

                <a style='background-image: url("/static/img/icon-https.gif");' href="http://adexprt.me/get/Markus_Schulz_-_Global_DJ_Broadcast_%282013-07-11%29_%28Inspiron%29?tag=bal" title="Anonymous Download">&nbsp;Anonymous Download</a>
    </div>
        <div>(Problems with magnets links are fixed by upgrading your <a href="http://www.bitlordapp.com/d/btl1/?sr=irm&chnl=details" target="_blank">torrent client</a>!)</div>

    <div class="nfo">
<pre>=======================================================
Site: http://www.inspirontrance.com/
=======================================================


=======================================================
F B Page: Inspiron Trance
=======================================================


=======================================================
TWITTER : inspiron22
======================================================= 


Markus Schulz
01. Mobil - One Morning (Aleksey Sladkov Remix)
02. Store N Forward - Nuts
03. Alter Future vs. Holbrook &#38; SkyKeeper - Megapolis
04. Danilo Ercole - Cruzer
05. Aaron Camz - Emission
06. Markus Schulz Featuring Sarah Howells - Tempted
07. M.I.K.E. Presents Caromax - Inner Thoughts
08. Ruffault - Progressive Dream
09. Styller - What We Left Behind
10. Meridian - Exit
11. Lange - A Different Shade of Crazy
12. Tucandeo Featuring Natalie Gioia - Disappear (Xtigma Remix)
13. Sebastian Weikum - Sky is the Limit
14. Markus Schulz - Don&#39;t Leave Until the Sunrise

Guy J
01. Roger Martinez &#38; Secret Cinema - Menthol Raga (Guy J Remix)
02. Ambassador - The Fade (Guy J Remix)
03. Guy J - Seven
04. Echomen &#9516;&#251; Perpetual (Guy J Remix)

Back with Markus Schulz
15. Mauro Picotto &#38; Riccardo Ferri - New Time, New Place (New World Punx Remix)
16. Grube &#38; Hovsepian - Trickster
17. Nifra - Waves
18. Markus Schulz featuring Dauby - Perfect (Digital X Remix) [Global Selection]
19. Basil O&#39;Glue - Gilgamesh
20. Skytech - The Other Side
21. ID


Enjoy
(Inspiron)      </pre>
    </div>

I've used this code which parses the whole details instead of parsing the 'seeders' & 'leechers'

try {
                document = Jsoup.connect(BLOG_URL).get();
                title = document.title();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
            // selector query
            Elements nodeBlogStats = document.select("div#details");
            // check results
            if (nodeBlogStats.size() > 0) {
                // get value
                result = nodeBlogStats.get(0).text();
            }

解决方案

According to http://jsoup.org/apidocs/org/jsoup/select/Selector.html, you are looking for

E ~ F an F element preceded by sibling E

and

:contains(text) elements that contains the specified text.

I would try

Element seeders = document.select("dt:contains(Seeders) ~ dd").get(0);
Element leechers = document.select("dt:contains(Leechers) ~ dd").get(0);

这篇关于在Android中使用Jsoup选择组合HTML解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆