HtmlUnit getByXpath返回null [英] HtmlUnit getByXpath returns null

查看:102
本文介绍了HtmlUnit getByXpath返回null的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Groovy进行编码,但是,我不相信它是针对语言的一系列问题.

I am coding with Groovy, however, I don't believe its a language specific set of questions.

我实际上有两个问题

第一个问题

使用HtmlUnit时遇到问题.它告诉我,我尝试获取的内容为空.

I've run into an issue while using HtmlUnit. It is telling me that what I am trying to grab is null.

我正在测试的页面是: http://browse.deviantart.com/resources /applications/psbrushes/?order = 9& offset = 0#/dbwam4

The page I'm testing it on is: http://browse.deviantart.com/resources/applications/psbrushes/?order=9&offset=0#/dbwam4

我的代码:

client = new WebClient(BrowserVersion.FIREFOX_3)
client.javaScriptEnabled = false

page = client.getPage(url)

//coming up as null
title = page.getByXPath("//html/body/div[4]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a")

println title

这只是打印出来: []

这是因为页面使用 onclick()吗?如果是这样,我将如何解决?启用javascript会在我的cmd提示中造成混乱.

Is this because the page uses onclick()? If so, how would I get around that? Enabling javascript creates a mess in my cmd prompt.

第二个问题

我也想获取图像,但是遇到了麻烦,因为当我尝试通过Firebug获取XPath时,它显示为://* [@ id ="gmi-ResViewSizer_img"]

I am wanting to also get the image but am having trouble because when I attempt to get the XPath (via firebug) it shows up as: //*[@id="gmi-ResViewSizer_img"]

我该如何处理?

推荐答案

第一个答案:

/html/body/div[3]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a

您的XPATH在主体第4个div的谓词过滤器中被关闭了,它应该是第3个div.从您最初使用Firebug捕获XPATH时开始,网站的HTML可以/确实可以更改.您可能需要调整XPATH以适应潜在的更改,并且对文档结构中的某些差异不太敏感.

Your XPATH was off by one in the predicate filter for the 4th div of the body, it should be the 3rd div. It appears the HTML for the site can/does change from when you had origionally snagged the XPATH using Firebug. You may need to adjust your XPATH to accommodate for potential change and be less sensitive to some differences in document structure.

也许是这样的:

/html/body//div/h1/a

第二个答案:您列出的XPATH将起作用.它可能看起来很奇怪/很短(可能不是最有效的),但是//从根节点开始,并遍历树中的每个节点,*在任何元素上都匹配(包括img),并且[]谓词过滤器将其限制为具有id属性且其值等于"gmi-ResViewSizer_img"的那些.

Second Answer: The XPATH that you listed will work. It may look odd/short(and may not be the most efficient), but // starts at the root node and looks throughout every node in the tree, * matches on any element(to include the img) and the [] predicate filter restricts it to those that have an id attribute who's value equals "gmi-ResViewSizer_img".

XPATH还有许多其他选项也可以使用.它还将取决于HTML结构更改的频率.这也是一种适用于所引用页面以选择img:

There are many other options for XPATHs that could work as well. It will also depend on how often the HTML structure changes. This is one that also works for the page referenced to select that img:

/html/body/div/div/div/div/img[1]

这篇关于HtmlUnit getByXpath返回null的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆