使用HTMLUnit和XPath获取span元素列表的内容 [英] Get content of list of span elements with HTMLUnit and XPath

查看:819
本文介绍了使用HTMLUnit和XPath获取span元素列表的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从HTML文档中获取值列表.我正在使用HTMLUnit.

类主题中有很多span元素.我想提取span标签中的内容:

<span class="topic">
  <a href="http://website.com/page/2342" class="id-24223 topic-link J_onClick topic-info-hover">Lean Startup</a>
 </span>

我的代码如下:

    List<?> topics = (List)page.getByXPath("//span[@class='topic']/text()");

但是,每当我尝试遍历列表时,都会得到一个NoSuchElementException.谁能看到一个明显的错误?此外,还应提供指向优秀教程的链接.

解决方案

如果您始终拥有<a>,则只需将其添加到XPath中,然后从a中获取text(). /p>

如果您真的不知道是否总是会有一个a,那么我建议使用所有HtmlElement及其后代都拥有的.asText()方法.

因此,首先获取每个跨度:

List<?> topics = (List)page.getByXPath("//span[@class='topic']");

然后,在循环中,获取每个跨度内的文本:

topic.asText();

I want to get a list of values from an HTML document. I am using HTMLUnit.

There are many span elements with the class topic. I want to extract the content within the span tags:

<span class="topic">
  <a href="http://website.com/page/2342" class="id-24223 topic-link J_onClick topic-info-hover">Lean Startup</a>
 </span>

My code looks like this:

    List<?> topics = (List)page.getByXPath("//span[@class='topic']/text()");

However whenever I try to iterate over the list I get a NoSuchElementException. Can anyone see an obvious mistake? Also links to good tutorials would be appreciated.

解决方案

If you know you'll always have an <a> then just add it to the XPath and then get the text() from the a.

If you don't really know if you always will have an a in there then I'd recommend to use the .asText() method that all HtmlElement and their descendants have.

So first get each of the spans:

List<?> topics = (List)page.getByXPath("//span[@class='topic']");

And then, in the loop, get the text inside each of the spans:

topic.asText();

这篇关于使用HTMLUnit和XPath获取span元素列表的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆