Python Splinter(SeleniumHQ)如何拍摄许多网页的屏幕截图? [连接被拒绝] [英] Python Splinter (SeleniumHQ) how to take a screenshot of many webpages? [Connection refused]

查看:227
本文介绍了Python Splinter(SeleniumHQ)如何拍摄许多网页的屏幕截图? [连接被拒绝]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想拍摄许多网页的屏幕截图,我这样写:

I want to take a screenshot of many webpages, I wrote this:

from splinter.browser import Browser
import urllib2
from urllib2 import URLError

urls = ['http://ubuntu.com/', 'http://xubuntu.org/']


try :
    browser = Browser('firefox')
    for i in range(0, len(urls)) :
        browser.visit(urls[i])
        if browser.status_code.is_success() :
            browser.driver.save_screenshot('your_screenshot' + str(i) + '.png')
        browser.quit()
except SystemError :
    print('install firefox!')
except urllib2.URLError, e:
    print(e)
    print('theres no such website')
except Exception, e :
    print(e)
    browser.quit()

我收到此错误:

<urlopen error [Errno 111] Connection refused>

如何修复?:)

编辑

当我在txt文件中有链接时,以下代码不起作用:

When I have links in txt file, the code below doesnt work:

from splinter import Browser
import socket

urls = []
numbers = []

with open("urls.txt", 'r') as filename :
    for line in filename :
        line = line.strip()
        words = line.split("\t")
        numbers.append(str(words[0]))
        urls.append(str(words[1].rstrip()))

print(urls)

browser = None    
try:
    browser = Browser('firefox')
    for i, url in enumerate(urls, start=1):
        try:
            browser.visit(url)
            if browser.status_code.is_success():
                browser.driver.save_screenshot('your_screenshot_%03d.png' % i)
        except socket.gaierror, e:
            print "URL not found: %s" % url
finally:
    if browser is not None:
        browser.quit()

我的txt文件如下:

1   http//ubuntu.com/
2   http//xubuntu.org/
3   http//kubuntu.org/

当我运行它时,出现错误:

when I ran it, I got errors:

$ python test.py 
['http//ubuntu.com/', 'http//xubuntu.org/', 'http//kubuntu.org/']
Traceback (most recent call last):
  File "test.py", line 21, in <module>
    browser.visit(url)
  File "/usr/local/lib/python2.7/dist-packages/splinter/driver/webdriver/__init__.py", line 79, in visit
    self.driver.get(url)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 168, in get
    self.execute(Command.GET, {'url': url})
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 156, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 147, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: u'Component returned failure code: 0x804b000a (NS_ERROR_MALFORMED_URI) [nsIIOService.newURI]'

这次出什么事了?

推荐答案

您的问题是您在URL循环内执行browser.quit(),因此第二个URL不再可用.

Your problem is you do browser.quit() inside of your loop through the URLs, so it is no longer open for the second URL.

这是您代码的更新版本:

Here's an updated version of your code:

from splinter import Browser
import socket

urls = ['http://ubuntu.com/', 'http://xubuntu.org/']

browser = None    
try:
    browser = Browser('firefox')
    for i, url in enumerate(urls, start=1):
        try:
            browser.visit(url)
            if browser.status_code.is_success():
                browser.driver.save_screenshot('your_screenshot_%03d.png' % i)
        except socket.gaierror, e:
            print "URL not found: %s" % url
finally:
    if browser is not None:
        browser.quit()

主要的变化是将browser.quit()代码移动到您的主异常处理程序的finally中,这样无论发生什么问题都将发生.还要注意使用enumerate来提供迭代器值及其索引.在维护自己的索引指针方面,这是Python中的推荐方法.

The major change is moving the browser.quit() code into your main exception handler's finally, so that it'll happen no matter what goes wrong. Note also the use of enumerate to provide both the iterator value and its index; this is the recommend approach in Python over maintaining your own index pointer.

我不确定它是否与您的代码相关,但是我发现splinterurllib2.URLError之上引发了socket.gaierror异常,因此我展示了如何也可以捕获它们.我把这个异常处理程序移到了循环中.即使一个或多个网址不存在,这仍将继续获取其余的屏幕截图.

I'm not sure if it's relevant for your code, but I found splinter raised socket.gaierror exceptions over urllib2.URLError, so I showed how you could trap them as well. I moved this exception handler inside of the loop; this will continue to grab the remaining screenshots even if one or more of the URLs are non-existent.

这篇关于Python Splinter(SeleniumHQ)如何拍摄许多网页的屏幕截图? [连接被拒绝]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆