如何使用带有多个类名元素的Jsoup解析html文件? [英] how to parse html file using Jsoup with multiple class-name element?

查看:78
本文介绍了如何使用带有多个类名元素的Jsoup解析html文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的Java代码对于带有css-sched-table-title之类的html文件可以很好地工作.

the below java code works fine for html file with class for eg css-sched-table-title.

但是我有多个类名可以在html文件中找到,例如css-sched-waypoints,css-sched-times.如何在jsoup中使用getElementsByClass方法组合搜索.我不想多次编写代码,因为我想保留订单.我的意思是我想要

However i have multiple class names to find for in the html file eg css-sched-waypoints , css-sched-times. How do i combine the search using getElementsByClass method in jsoup. I don't want to write the code multiple times because I want to preserver the order. My point is i want something like

doc.getElementsByClass("css-sched-table-title" || doc.getElementsByClass("css-sched-waypoints");

    Document doc = Jsoup.parse(content);

    Elements ele = doc.getElementsByClass("css-sched-table-title");
    for (Element link : ele) {

       String linkText = link.text();
       System.out.println(linkText);    

   }

.

<tr ALIGN="CENTER">
           <td CLASS="css-sched-times">&nbsp;</td>
           <td CLASS="css-sched-times">6:15</td>
           <td CLASS="css-sched-times">&nbsp;</td>
           <td CLASS="css-sched-times">6:20</td>
           <td CLASS="css-sched-times">&nbsp;</td>
           <td CLASS="css-sched-times">6:24</td>
           <td CLASS="css-sched-times">&nbsp;</td>
           <td CLASS="css-sched-times">6:34</td>
           <td CLASS="css-sched-times">&nbsp;</td>
           <td CLASS="css-sched-times">6:34</td>
           <td CLASS="css-sched-times">&nbsp;</td>
           <td CLASS="css-sched-times">6:40</td>
           <td CLASS="css-sched-times">&nbsp;</td>
           <td CLASS="css-sched-times">6:46</td>
           <td CLASS="css-sched-times">&nbsp;</td>
           <td CLASS="css-sched-times">6:54</td>
</tr>
<tr VALIGN="BOTTOM">
           <TD>&nbsp;</TD>
           <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Townline and Southern</TD>
           <TD>&nbsp;</TD>
           <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Clearbrook and Blueridge</TD>
           <TD>&nbsp;</TD>
           <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Clearbrook and South Fraser</TD>
           <TD>&nbsp;</TD>
           <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Ar. Bourquin Exchange</TD>
           <TD>&nbsp;</TD>
           <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Lv. Bourquin Exchange</TD>
           <TD>&nbsp;</TD>
           <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Downtown Abbotsford</TD>
           <TD>&nbsp;</TD>
           <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">McMillan and Old Yale</TD>
           <TD>&nbsp;</TD>
           <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Sandy Hill and Old Clayburn</TD>
   </tr>

 <tr ALIGN="CENTER">
      <td CLASS="css-sched-times">&nbsp;</td>
      <td CLASS="css-sched-times">8:12</td>
      <td CLASS="css-sched-times">&nbsp;</td>
      <td CLASS="css-sched-times">8:17</td>
      <td CLASS="css-sched-times">&nbsp;</td>
      <td CLASS="css-sched-times">8:21</td>
      <td CLASS="css-sched-times">&nbsp;</td>
      <td CLASS="css-sched-times">8:31</td>
      <td CLASS="css-sched-times">&nbsp;</td>
      <td CLASS="css-sched-times">8:34</td>
      <td CLASS="css-sched-times">&nbsp;</td>
      <td CLASS="css-sched-times">8:40</td>
      <td CLASS="css-sched-times">&nbsp;</td>
      <td CLASS="css-sched-times">8:46</td>
      <td CLASS="css-sched-times">&nbsp;</td>
      <td CLASS="css-sched-times">8:54</td>
    </tr>

推荐答案

从您先前的查询中获取线索,当我尝试通过有效的Selector语法组合3个td时,我得到了您期望的结果.

Taking cues from your earlier query, when I try and combine the 3 tds through a valid Selector syntax, I get the result you are expecting.

doc.select("td[class=css-sched-table-title], td[class=css-sched-waypoints], td[class=css-sched-times]")

注意,您可以在选择器语法(如Elements row = doc.select("td[class=css-sched-table-title], td[class=css-sched-waypoints], td[class=css-sched-times]");)中组合多个条件,从而有效地成为您的OR运算符.

Note, you can combine multiple conditions within your selector syntax like this Elements row = doc.select("td[class=css-sched-table-title], td[class=css-sched-waypoints], td[class=css-sched-times]"); which effectively becomes your OR operator.

Elements row = doc.select("td[class=css-sched-table-title], td[class=css-sched-waypoints], td[class=css-sched-times]");
        System.out.println("::Total Count::" + row.size());

        Iterator<Element> iterator = row.listIterator();
        while (iterator.hasNext()) {
            Element element = iterator.next();
            String id = element.attr("id");
            String classes = element.attr("class");
            String value = element.text();
            System.out.println("Id : " + id + ", classes : " + classes
                    + ", value : " + value);
        }

礼物

::Total Count::25
Id : , classes : css-sched-table-title, value : Saturday - Afternoon
Id : , classes : css-sched-waypoints, value : Townline and Southern
Id : , classes : css-sched-waypoints, value : Clearbrook and Blueridge
Id : , classes : css-sched-waypoints, value : Clearbrook and South Fraser
Id : , classes : css-sched-waypoints, value : Ar. Bourquin Exchange
Id : , classes : css-sched-waypoints, value : Lv. Bourquin Exchange
Id : , classes : css-sched-waypoints, value : Downtown Abbotsford
Id : , classes : css-sched-waypoints, value : McMillan and Old Yale
Id : , classes : css-sched-waypoints, value : Sandy Hill and Old Clayburn
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:15
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:20
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:24
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:34
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:34
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:40
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:46
Id : , classes : css-sched-times, value :  
Id : , classes : css-sched-times, value : 6:54

有关Selector语法的详细用法,请参见此处.

For the detailed usage of the Selector syntax refer to here.

这篇关于如何使用带有多个类名元素的Jsoup解析html文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆