如何在jsoup/javascript中的两个标签之间获取内容 [英] How to get contents between two tags in jsoup/javascript
问题描述
<p><strong>Chapter One</strong></p><p>A piece of computer code</p>
<table>
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Age</th>
</tr>
<tr>
<td>Jill</td>
<td>Smith</td>
<td>50</td>
</tr>
</table>
<p><strong>Chapter Two</strong></p><p>Java in 10 minutes</p>
如何在这两个强"之间获取内容,以便获得第一章中将包含一段计算机代码"和表格的内容? "strong"的nextSibling()只能检索一个元素,如何获取所有元素,直到遇到另一个"strong"? 谢谢
How to get contents between those two "strong" so I can get the Chapter One will have "A piece of computer code" and the table? The nextSibling() of "strong" can only retrieve one element, how to get all elements until I met another "strong"? Thanks
推荐答案
这种格式是否一致?如果是这样,您只需查询nextSibling
两次以获取强元素的父级(p).
Is this format going to be consistent? If so, you can simply query nextSibling
twice for the strong element's parent (p).
如果要改变,您可能需要手动检查何时停止遍历兄弟姐妹,例如验证兄弟姐妹是否包含强元素.
If it's going to vary, you might need to manually check when to stop iterating through the siblings, such as verifying if the sibling contains a strong element.
这完全取决于整个上下文.
It all depends on the full context.
这里是带有基本循环的示例.在不同情况下,您可能想添加更多检查或更好的查询.
Here's example with basic loops. You may want to add more checks or better queries given a different situation.
Document doc = Jsoup.connect(url).get();
List<Elements> data = new ArrayList<>();
Elements chapters = doc.select("p > strong");
for (Element chapter : chapters) {
if (!chapter.ownText().toLowerCase().contains("chapter"))
continue; //we've reached a strong element that isn't actually a chapter
List<Element> siblings = new ArrayList<>();
Element next = chapter.nextElementSibling();
while (next != null) {
if (next.ownText().toLowerCase().contains("chapter"))
break; //we've reached the end of this chapter
siblings.add(next);
next = next.nextElementSibling();
}
data.add(new Elements(siblings));
}
这篇关于如何在jsoup/javascript中的两个标签之间获取内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!