JSoup解析标签内的数据 [英] JSoup parsing data from within a tag
问题描述
我正在设法解析我需要的大部分数据,因为它包含在一个href标签中,我需要在mmsi =之后出现的数字
I am managing to parse most of the data I need except for one as it is contained within the a href tag and I am needing the number that appears after "mmsi="
<a href="/showship.php?mmsi=235083844">Sunsail 4013</a>
我当前的解析器获取我需要的所有其他数据,如下所示。我尝试了一些代码注释掉的东西偶尔返回未指定的条目。有什么方法可以添加到我的代码中,以便在返回数据时,数字235083844在名称Sunsail 4013之前返回?
my current parser fetches all the other data I need and is below. I tried a few things out the code commented out returns unspecified occasionally for an entry. Is there any way I can add to my code below so that when the data is returned the number "235083844" returns before the name "Sunsail 4013"?
try {
File input = new File("shipMove.txt");
Document doc = Jsoup.parse(input, null);
Elements tables = doc.select("table.shipInfo");
for( Element element : tables )
{
Elements tdTags = element.select("td");
//Elements mmsi = element.select("a[href*=/showship.php?mmsi=]");
// Iterate over all 'td' tags found
for( Element td : tdTags ){
// Print it's text if not empty
final String text = td.text();
if( text.isEmpty() == false )
{
System.out.println(td.text());
}
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
解析的数据示例和html文件这里
Example of data parsed and html file here
推荐答案
- 您可以在
元素
对象上使用attr
来检索特定属性的值 - 如果String模式一致,则使用
substring
获取所需的值
- You can use
attr
on anElement
object to retrieve a particular attribute's value - Use
substring
to get the required value if the String pattern is consistent
代码
// Using just your anchor html tag
String html = "<a href=\"/showship.php?mmsi=235083844\">Sunsail 4013</a>";
Document doc = Jsoup.parse(html);
// Just selecting the anchor tag, for your implementation use a generic one
Element link = doc.select("a").first();
// Get the attribute value
String url = link.attr("href");
// Check for nulls here and take the substring from '=' onwards
String id = url.substring(url.indexOf('=') + 1);
System.out.println(id + " "+ link.text());
给予,
235083844 Sunsail 4013
修改条件在您的 for
循环代码中:
...
for (Element td : tdTags) {
// Print it's text if not empty
final String text = td.text();
if (text.isEmpty() == false) {
if (td.getElementsByTag("a").first() != null) {
// Get the attribute value
String url = td.getElementsByTag("a").first().attr("href");
// Check for nulls here and take the substring from '=' onwards
String id = url.substring(url.indexOf('=') + 1);
System.out.println(id + " "+ td.text());
}
else {
System.out.println(td.text());
}
}
}
...
上面的代码将打印所需的输出。
The above code would print the desired output.
这篇关于JSoup解析标签内的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!