HtmlUnit getByXpath 返回 null [英] HtmlUnit getByXpath returns null

查看:21
本文介绍了HtmlUnit getByXpath 返回 null的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Groovy 编码,但是,我不认为它是一组特定于语言的问题.

I am coding with Groovy, however, I don't believe its a language specific set of questions.

其实我有两个问题

第一个问题

我在使用 HtmlUnit 时遇到了问题.它告诉我我试图抓住的东西是空的.

I've run into an issue while using HtmlUnit. It is telling me that what I am trying to grab is null.

我正在测试的页面是:http://browse.deviantart.com/resources/applications/psbrushes/?order=9&offset=0#/dbwam4

我的代码:

client = new WebClient(BrowserVersion.FIREFOX_3)
client.javaScriptEnabled = false

page = client.getPage(url)

//coming up as null
title = page.getByXPath("//html/body/div[4]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a")

println title

这只是打印出来:[]

这是因为页面使用了 onclick() 吗?如果是这样,我将如何解决这个问题?启用 javascript 会使我的 cmd 提示符变得一团糟.

Is this because the page uses onclick()? If so, how would I get around that? Enabling javascript creates a mess in my cmd prompt.

第二个问题

我也想获取图像但遇到了问题,因为当我尝试获取 XPath(通过 firebug)时,它显示为://*[@id="gmi-ResViewSizer_img"]

I am wanting to also get the image but am having trouble because when I attempt to get the XPath (via firebug) it shows up as: //*[@id="gmi-ResViewSizer_img"]

我该如何处理?

推荐答案

第一个答案:

/html/body/div[3]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a

您的 XPATH 在主体的第 4 个 div 的谓词过滤器中偏离了 1,它应该是第 3 个 div.当您使用 Firebug 最初获取 XPATH 时,站点的 HTML 似乎可以/确实会发生变化.您可能需要调整 XPATH 以适应潜在的变化,并对文档结构中的某些差异不那么敏感.

Your XPATH was off by one in the predicate filter for the 4th div of the body, it should be the 3rd div. It appears the HTML for the site can/does change from when you had origionally snagged the XPATH using Firebug. You may need to adjust your XPATH to accommodate for potential change and be less sensitive to some differences in document structure.

也许是这样的:

/html/body//div/h1/a

第二个答案:您列出的 XPATH 将起作用.它可能看起来很奇怪/短(并且可能不是最有效的),但是 // 从根节点开始并查看树中的每个节点,* 匹配任何元素(包括 img)和 [] 谓词过滤器将其限制为具有 id 属性且值等于gmi-ResViewSizer_img".

Second Answer: The XPATH that you listed will work. It may look odd/short(and may not be the most efficient), but // starts at the root node and looks throughout every node in the tree, * matches on any element(to include the img) and the [] predicate filter restricts it to those that have an id attribute who's value equals "gmi-ResViewSizer_img".

XPATH 有许多其他选项也可以使用.它还取决于 HTML 结构更改的频率.这也适用于被引用的页面以选择 img:

There are many other options for XPATHs that could work as well. It will also depend on how often the HTML structure changes. This is one that also works for the page referenced to select that img:

/html/body/div/div/div/div/img[1]

这篇关于HtmlUnit getByXpath 返回 null的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆