如何使用HtmlUnit获取特定跨度之间的文本 [英] how to get text between a specific span with HtmlUnit

查看:60
本文介绍了如何使用HtmlUnit获取特定跨度之间的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是HtmlUnit的新手,我甚至不确定它是否适合我的项目. 我正在尝试解析一个网站,并从中提取所需的值.我需要从中获取值"07:05",

I'm new to HtmlUnit and I'm not even sure if it is the right tool for my project. I'm trying to parse a website and extract the values I need from it. I need to get the value "07:05" from this,

<span class="tim tim-dep">07:05</span>

我知道我可以使用getTextContent()来提取值,但是我不知道如何选择特定范围.我使用getElementById查找

I know that I can use the getTextContent() for extracting the value but I don't know how I can select a specific span. I used getElementById for finding the

<div>

该表达式所属的

标记,但是当我获得该div的文本内容时,我会得到一整行文本,其中包含许多不必要的数据.有人可以告诉我如何使用类名选择该表达式吗?

tag that this expression belongs to but when I get the text content of that div, I get a whole line of text with a lot of unnecessary data. Can someone tell me how I can select this expression, possibly using the class name?

推荐答案

您需要浏览页面并与之交互,如下所示:

You need to browse a page and interact with it, like this:

final WebClient web = new HtmlUnit();
final HtmlPage page = web.getPage("http://www.whateveryouwant.com.br");

通过标记名获取元素,并对其进行迭代:

Get the elements by the tagname, and iterate over it:

final List<DomElement> spans = page.getElementTagName("span");
for (DomElement element : spans) {
    if (element.getAttribute("class").equals("tim tim-dep")) {
        return element.getNodeValue();
    }
}

或者只使用XPath:

Or just use XPath:

// Not sure what getFirstByXPath return
DomElement element = page.getFirstByXPath("//span[@class='tim tim-dep']");
final String text = element.getNodeValue();

这篇关于如何使用HtmlUnit获取特定跨度之间的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆