如何使用HtmlUnit获取特定跨度之间的文本 [英] how to get text between a specific span with HtmlUnit
问题描述
我是HtmlUnit的新手,我甚至不确定它是否适合我的项目. 我正在尝试解析一个网站,并从中提取所需的值.我需要从中获取值"07:05",
I'm new to HtmlUnit and I'm not even sure if it is the right tool for my project. I'm trying to parse a website and extract the values I need from it. I need to get the value "07:05" from this,
<span class="tim tim-dep">07:05</span>
我知道我可以使用getTextContent()来提取值,但是我不知道如何选择特定范围.我使用getElementById查找
I know that I can use the getTextContent() for extracting the value but I don't know how I can select a specific span. I used getElementById for finding the
<div>
该表达式所属的
标记,但是当我获得该div的文本内容时,我会得到一整行文本,其中包含许多不必要的数据.有人可以告诉我如何使用类名选择该表达式吗?
tag that this expression belongs to but when I get the text content of that div, I get a whole line of text with a lot of unnecessary data. Can someone tell me how I can select this expression, possibly using the class name?
推荐答案
您需要浏览页面并与之交互,如下所示:
You need to browse a page and interact with it, like this:
final WebClient web = new HtmlUnit();
final HtmlPage page = web.getPage("http://www.whateveryouwant.com.br");
通过标记名获取元素,并对其进行迭代:
Get the elements by the tagname, and iterate over it:
final List<DomElement> spans = page.getElementTagName("span");
for (DomElement element : spans) {
if (element.getAttribute("class").equals("tim tim-dep")) {
return element.getNodeValue();
}
}
或者只使用XPath:
Or just use XPath:
// Not sure what getFirstByXPath return
DomElement element = page.getFirstByXPath("//span[@class='tim tim-dep']");
final String text = element.getNodeValue();
这篇关于如何使用HtmlUnit获取特定跨度之间的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!