使用beautifulsoup python调用onclick事件 [英] invoking onclick event with beautifulsoup python

查看:428
本文介绍了使用beautifulsoup python调用onclick事件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图从以下网站获取塞浦路斯所有住宿的链接: http://www.zoover.nl/cyprus

I am trying to fetch the links to all accomodations in Cyprus from this website: http://www.zoover.nl/cyprus

到目前为止,我可以检索已经显示的前15个.因此,现在我必须调用"volgende"链接上的单击.但是我不知道如何做到这一点,并且在源代码中我无法追踪使用例如像贴在这里: 问题与调用点击事件"在html页面上使用Python中漂亮的汤

So far I can retrieve the first 15 which are already shown. So now I have to invoke the click on the "volgende"-link. However I don't know how to do that and in the source code I am not able to track down the function called to use e.g. sth like posted here: Issues with invoking "on click event" on the html page using beautiful soup in Python

我只需要执行单击"的步骤,这样我就可以获取接下来的15个链接,依此类推.

I only need the step where the "clicking" happens so I can fetch the next 15 links and so on.

有人知道如何提供帮助吗? 已经谢谢你了!

Does anybody know how to help? Thanks already!

我的代码现在看起来像这样:

My code looks like this now:

def getZooverLinks(country):
    zooverWeb = "http://www.zoover.nl/"
    url = zooverWeb + country
    parsedZooverWeb = parseURL(url)
    driver = webdriver.Firefox()
    driver.get(url)

    button = driver.find_element_by_class_name("next")
    links = []
    for page in xrange(1,3):
        for item in parsedZooverWeb.find_all(attrs={'class': 'blue2'}):
            for link in item.find_all('a'):
                newLink = zooverWeb + link.get('href')
                links.append(newLink)
        button.click()'

,我收到以下错误消息:

and I get the following error:

selenium.common.exceptions.StaleElementReferenceException:消息:元素不再附加到DOM 堆栈跟踪: 在fxdriver.cache.getElementAt(资源://fxdriver/modules/web-element-cache.js:8956) 在Utils.getElementAt(file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:8546) 位于fxdriver.preconditions.visible(file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:9585) at DelayedCommand.prototype.checkPreconditions_(file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12257) 在DelayedCommand.prototype.executeInternal_/h(file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12274)处 在DelayedCommand.prototype.executeInternal_上(file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12279) 在DelayedCommand.prototype.execute/< (文件:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12221)

selenium.common.exceptions.StaleElementReferenceException: Message: Element is no longer attached to the DOM Stacktrace: at fxdriver.cache.getElementAt (resource://fxdriver/modules/web-element-cache.js:8956) at Utils.getElementAt (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:8546) at fxdriver.preconditions.visible (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:9585) at DelayedCommand.prototype.checkPreconditions_ (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12257) at DelayedCommand.prototype.executeInternal_/h (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12274) at DelayedCommand.prototype.executeInternal_ (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12279) at DelayedCommand.prototype.execute/< (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12221)

我很困惑:/

推荐答案

虽然可能很想尝试使用Beautifulsoup的evaluateJavaScript方法执行此操作,但最后Beautifulsoup是

While it might be tempting to try to do this using Beautifulsoup's evaluateJavaScript method, in the end Beautifulsoup is a parser rather than an interactive web browsing client.

您应该认真考虑用硒解决此问题,如此答案中简要显示.对于硒,有很多不错的 Python绑定.

You should seriously consider solving this with selenium, as briefly shown in this answer. There are pretty good Python bindings available for selenium.

您可以只使用硒来找到元素并单击,然后将页面传递给Beautifulsoup,并使用现有代码来获取链接.

You could just use selenium to find the element and click it, and then pass the page on to Beautifulsoup, and use your existing code to fetch the links.

或者,您可以使用onclick处理程序中列出的Javascript.我从源代码中提取了此代码:EntityQuery('Ns=pPopularityScore%7c1&No=30&props=15292&dims=530&As=&N=0+3+10500915');. No参数每页增加15,但是props让我感到困惑.不过,我建议您不要使用硒,而是像客户端一样与网站进行交互.对于他们这方面的变化而言,这也更加强大.

Alternatively, you could use the Javascript that's listed in the onclick handler. I pulled this from the source: EntityQuery('Ns=pPopularityScore%7c1&No=30&props=15292&dims=530&As=&N=0+3+10500915');. The No parameter increments with 15 for each page, but the props has me guessing. I'd recommend not getting into this, though, and just interact with the website as a client would, using selenium. That's much more robust to changes on their side, as well.

这篇关于使用beautifulsoup python调用onclick事件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆