JSoup-如何解析嵌套文本? [英] JSoup - How to parse nested texts?

查看:69
本文介绍了JSoup-如何解析嵌套文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用JSoup解析网站的html.我想解析这部分:

I'm parsing html of a website with JSoup. I want to parse this part:

<td class="lastpost">
This is a text 1<br>
<a href="post/13594">Website Page - 1</a>
</td>

我想要这样:

String text = "This is a text 1";
String textNo = "Website Page - 1";
String link = "post/13594";

我如何获得这样的零件?

How can I get the parts like this?

推荐答案

您的代码将只获取您选择的td元素中的所有文本.如果要将文本存储在单独的变量中,则应像下面的代码一样分别抓取各部分.添加了额外的注释,以便您了解每件作品的购买方式/原因.

Your code would only get all the text that is in the td elements that you are selecting. If you want to store the text in separate variables, you should grab the parts separately like the following code. Extra comments added so you can understand how/why it is getting each piece.

// Get the first td element that has class="lastpost"
Element lastPost = document.select("td.lastpost").first();
// Get the first a element that is a child of the td
Element linkElement = lastPost.getElementsByTag("a").first();

// This text is the first child node of td, get that node and call toString
String text = lastPost.childNode(0).toString();
// This is the text within the a (link) element
String textNo = linkElement.text();
// This text is the href attribute value of the a (link) element
String link = linkElement.attr("href");

这篇关于JSoup-如何解析嵌套文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆