使用Java中的Jsoup提取HTML表格(跨度)标签 [英] Extract HTML Table ( span ) tags using Jsoup in Java

查看:122
本文介绍了使用Java中的Jsoup提取HTML表格(跨度)标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图提取td名称和span类。
在示例代码中,我想提取第一个td附件中的href和第二个td中的span标签。

我想要打印
鼠标,存在,是
键盘,无
双显示器,存在,是

当我使用下面的Java代码,我得到
鼠标是
键盘没有
双显示器是。

我有跨班名称?


HTML代码




 < tr> 

< td class =width =1%style =padding:0px;>

< / td>
< td class =>
< a href =/ accessory>鼠标< / a>
< / td>

< td class =tright>
< span class ='is_present'>是< / span>< br />

< / td>
< td class =tright>
& nbsp;< br />

< / td>

 < TR> 

< td class =width =1%style =padding:0px;>

< / td>
< td class =>
< a href =/ accessory>键盘和LT; / A>
< / td>


< td colspan =2class =style ='text-align:center;'>
< small>没有< / small>
< / td>



 < td class =width =1%style =padding:0px;> 

< / td>
< td class =>
< a href =/ accessory>双显示器< / a>
< / td>

< td class =tright>
< span class ='is_present'>是< / span>< br />

< / td>
< td class =tright>
& nbsp;< br />

< / td>







Java代码

private void printParse(String HTMLdata){

 元素表= data.select(table [class =computer_table)。first(); 

Iterator< Element> ite = table.select(td)。iterator();


while(ite.hasnext()){

sysout(ite.next()。 text());

}

}


$ c

元素结果= table.select(td);

 emails.add((dl.select( 小)文本())); (!)和.dl.select(span)。attr(class)。 ).length()> 1)
moneyDollars.add(dl.select(span)。attr(class));
}


I am trying to extract the td name and the span class. In the sample code, I want to extract the a href with in the first td "accessory" and the span tag in the second td.

I want to print Mouse, is-present, yes KeyBoard, No Dual-Monitor, is-present, Yes

When I use the below Java code, I get, Mouse Yes Keyboard No Dual-Monitor Yes.

How do I get the span class name?

HTML Code

<tr> 

  <td class="" width="1%" style="padding:0px;"> 

  </td> 
  <td class=""> 
    <a href="/accessory">Mouse</a> 
  </td> 

 <td class="tright "> 
    <span class='is_present'>Yes</span><br/> 

 </td> 
 <td class="tright "> 
    &nbsp;<br/> 

 </td> 

<tr> 

  <td class="" width="1%" style="padding:0px;"> 

  </td> 
  <td class=""> 
    <a href="/accessory"> KeyBoard</a> 
  </td> 


  <td colspan="2" class="" style='text-align:center;'> 
    <small>No</small> 
  </td> 

  <td class="" width="1%" style="padding:0px;"> 

  </td> 
  <td class=""> 
    <a href="/accessory">Dual-Monitor</a> 
  </td> 

  <td class="tright "> 
    <span class='is_present'>Yes</span><br/> 

 </td> 
 <td class="tright "> 
    &nbsp;<br/> 

</td> 

Java code

private void printParse(String HTMLdata){

Element table = data.select("table[class="computer_table").first();

Iterator<Element> ite = table.select("td").iterator();


while(ite.hasnext()){

      sysout(ite.next().text());

   }

}

解决方案

Element table = doc.select("table[id=computer_table]").first();

Elements results = table.select("td");

        for (Element dl : results) {
            if(!dl.text().equals("") && dl.text().length() > 1)
                pNames.add(dl.text());

            if((!dl.select("small").text().equals("")) && dl.select("small").text().length() > 1)
                emails.add((dl.select("small").text()));

            if(!dl.select("span").attr("class").equals("") && dl.select("span").attr("class").length() > 1)
                moneyDollars.add(dl.select("span").attr("class"));
        }

这篇关于使用Java中的Jsoup提取HTML表格(跨度)标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆