使用HTMLUnit和XPath获取span元素列表的内容 [英] Get content of list of span elements with HTMLUnit and XPath
问题描述
我想从HTML文档中获取值列表.我正在使用HTMLUnit.
类主题中有很多span元素.我想提取span标签中的内容:
<span class="topic">
<a href="http://website.com/page/2342" class="id-24223 topic-link J_onClick topic-info-hover">Lean Startup</a>
</span>
我的代码如下:
List<?> topics = (List)page.getByXPath("//span[@class='topic']/text()");
但是,每当我尝试遍历列表时,都会得到一个NoSuchElementException
.谁能看到一个明显的错误?此外,还应提供指向优秀教程的链接.
如果您始终拥有<a>
,则只需将其添加到XPath中,然后从a
中获取text()
. /p>
如果您真的不知道是否总是会有一个a
,那么我建议使用所有HtmlElement
及其后代都拥有的.asText()
方法.
因此,首先获取每个跨度:
List<?> topics = (List)page.getByXPath("//span[@class='topic']");
然后,在循环中,获取每个跨度内的文本:
topic.asText();
I want to get a list of values from an HTML document. I am using HTMLUnit.
There are many span elements with the class topic. I want to extract the content within the span tags:
<span class="topic">
<a href="http://website.com/page/2342" class="id-24223 topic-link J_onClick topic-info-hover">Lean Startup</a>
</span>
My code looks like this:
List<?> topics = (List)page.getByXPath("//span[@class='topic']/text()");
However whenever I try to iterate over the list I get a NoSuchElementException
. Can anyone see an obvious mistake? Also links to good tutorials would be appreciated.
If you know you'll always have an <a>
then just add it to the XPath and then get the text()
from the a
.
If you don't really know if you always will have an a
in there then I'd recommend to use the .asText()
method that all HtmlElement
and their descendants have.
So first get each of the spans:
List<?> topics = (List)page.getByXPath("//span[@class='topic']");
And then, in the loop, get the text inside each of the spans:
topic.asText();
这篇关于使用HTMLUnit和XPath获取span元素列表的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!