使用jsoup从HTML解析表格 [英] Parse a table from HTML using jsoup

查看:149
本文介绍了使用jsoup从HTML解析表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是我试图从中提取的示例:

 < table class =scripture> 
< tbody>
< tr>
< td class =versevalign =top>
< a name =2:1>< / a>< a class =vershref =javascript:getParallel('LUK',2,1); title =Klik om grondtekst en SV te zien>& nbsp; 1& nbsp;< / a>
< / td>
< td class =content>
< span class =main>在日历中记录日期和时间,以及记录日期和时间的日期。< / span>
< / td>
< / tr>
< / tbody>
< / table>

< table class =scripture>
< tbody>
< tr>
< td class =versevalign =top>
< a name =2:2>< / a>< a class =vershref =javascript:getParallel('LUK',2,2); title =Klik om grondtekst en SV te zien>& nbsp; 2& nbsp;< / a>
< / td>
< td class =content>
< span class =main> Deze eerste inschrijving vond plaats toen Cyrenius overSyriëstadhouder was。< / span>
< / td>
< / tr>
< / tbody>
< / table>

这与我在 link ,但我想获得经文和圣经内容。我怎么做到这一点?



到目前为止,这是我试过的:

  Element table = doc.select(table [class = scripture])。first(); 
Log.e(BB,passage1:+ table.ownText());

但它不显示任何内容。任何帮助,将不胜感激。谢谢。

解决方案

假设您想获取与本身包含 2:2 ,你可以这样做:

  String verse =2:2 ; 
//位于类圣经表中的类main的范围
//包含一个类verse的td,其中的属性名是verse
的值链接p = doc .select(
String.format(table.scripture:has(td.verse a [name =%s])span.main,verse)
).first();
System.out.println(p.text());

输出:

  Deze eerste inschrijving vond plaats toen Cyrenius overSyriëstadhouder was。 


I've got another problem with scraping html text. Here's the sample of what I'm trying to extract from:

<table class="scripture">
  <tbody>
   <tr>
   <td class="verse" valign="top">
    <a name="2:1"></a><a class="vers" href="javascript:getParallel('LUK', 2, 1);" title="Klik om grondtekst en SV te zien">&nbsp;1&nbsp;</a>
   </td>
   <td class="content">
    <span class="main">En het geschiedde in die dagen dat er een gebod uitging van keizer Augustus dat heel de wereld ingeschreven moest worden.</span>
   </td>
   </tr>
  </tbody>
</table>

<table class="scripture">
  <tbody>
   <tr>
   <td class="verse" valign="top">
    <a name="2:2"></a><a class="vers" href="javascript:getParallel('LUK', 2, 2);" title="Klik om grondtekst en SV te zien">&nbsp;2&nbsp;</a>
   </td>
   <td class="content">
    <span class="main">Deze eerste inschrijving vond plaats toen Cyrenius over Syrië stadhouder was.</span>
   </td>
   </tr>
  </tbody>
</table>

This is similar to my problem in this link but I want to get the verse text and the Scripture content. How do I achieve this?

So far this is what I've tried:

Element table = doc.select("table[class=scripture]").first();
Log.e("BB", "passage1: " + table.ownText());

But it doesn't display anything. Any help would be appreciated. Thanks.

解决方案

Assuming that you want to get the span's content corresponding to the table that itself contains the verse 2:2, you can do it with:

String verse = "2:2";
// The span of class main located inside the table of class scripture
// that contains a td of class verse with a link whose attribute name is the value of verse
Element p = doc.select(
    String.format("table.scripture:has(td.verse a[name=%s]) span.main", verse)
).first();
System.out.println(p.text());

Output:

Deze eerste inschrijving vond plaats toen Cyrenius over Syrië stadhouder was.

这篇关于使用jsoup从HTML解析表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆