Python Splinter(SeleniumHQ)如何拍摄许多网页的屏幕截图? [连接被拒绝] [英] Python Splinter (SeleniumHQ) how to take a screenshot of many webpages? [Connection refused]
问题描述
我想拍摄许多网页的屏幕截图,我这样写:
I want to take a screenshot of many webpages, I wrote this:
from splinter.browser import Browser
import urllib2
from urllib2 import URLError
urls = ['http://ubuntu.com/', 'http://xubuntu.org/']
try :
browser = Browser('firefox')
for i in range(0, len(urls)) :
browser.visit(urls[i])
if browser.status_code.is_success() :
browser.driver.save_screenshot('your_screenshot' + str(i) + '.png')
browser.quit()
except SystemError :
print('install firefox!')
except urllib2.URLError, e:
print(e)
print('theres no such website')
except Exception, e :
print(e)
browser.quit()
我收到此错误:
<urlopen error [Errno 111] Connection refused>
如何修复?:)
编辑
当我在txt文件中有链接时,以下代码不起作用:
When I have links in txt file, the code below doesnt work:
from splinter import Browser
import socket
urls = []
numbers = []
with open("urls.txt", 'r') as filename :
for line in filename :
line = line.strip()
words = line.split("\t")
numbers.append(str(words[0]))
urls.append(str(words[1].rstrip()))
print(urls)
browser = None
try:
browser = Browser('firefox')
for i, url in enumerate(urls, start=1):
try:
browser.visit(url)
if browser.status_code.is_success():
browser.driver.save_screenshot('your_screenshot_%03d.png' % i)
except socket.gaierror, e:
print "URL not found: %s" % url
finally:
if browser is not None:
browser.quit()
我的txt文件如下:
1 http//ubuntu.com/
2 http//xubuntu.org/
3 http//kubuntu.org/
当我运行它时,出现错误:
when I ran it, I got errors:
$ python test.py
['http//ubuntu.com/', 'http//xubuntu.org/', 'http//kubuntu.org/']
Traceback (most recent call last):
File "test.py", line 21, in <module>
browser.visit(url)
File "/usr/local/lib/python2.7/dist-packages/splinter/driver/webdriver/__init__.py", line 79, in visit
self.driver.get(url)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 168, in get
self.execute(Command.GET, {'url': url})
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 156, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 147, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: u'Component returned failure code: 0x804b000a (NS_ERROR_MALFORMED_URI) [nsIIOService.newURI]'
这次出什么事了?
推荐答案
您的问题是您在URL循环内执行browser.quit()
,因此第二个URL不再可用.
Your problem is you do browser.quit()
inside of your loop through the URLs, so it is no longer open for the second URL.
这是您代码的更新版本:
Here's an updated version of your code:
from splinter import Browser
import socket
urls = ['http://ubuntu.com/', 'http://xubuntu.org/']
browser = None
try:
browser = Browser('firefox')
for i, url in enumerate(urls, start=1):
try:
browser.visit(url)
if browser.status_code.is_success():
browser.driver.save_screenshot('your_screenshot_%03d.png' % i)
except socket.gaierror, e:
print "URL not found: %s" % url
finally:
if browser is not None:
browser.quit()
主要的变化是将browser.quit()
代码移动到您的主异常处理程序的finally
中,这样无论发生什么问题都将发生.还要注意使用enumerate
来提供迭代器值及其索引.在维护自己的索引指针方面,这是Python中的推荐方法.
The major change is moving the browser.quit()
code into your main exception handler's finally
, so that it'll happen no matter what goes wrong. Note also the use of enumerate
to provide both the iterator value and its index; this is the recommend approach in Python over maintaining your own index pointer.
我不确定它是否与您的代码相关,但是我发现splinter
在urllib2.URLError
之上引发了socket.gaierror
异常,因此我展示了如何也可以捕获它们.我把这个异常处理程序移到了循环中.即使一个或多个网址不存在,这仍将继续获取其余的屏幕截图.
I'm not sure if it's relevant for your code, but I found splinter
raised socket.gaierror
exceptions over urllib2.URLError
, so I showed how you could trap them as well. I moved this exception handler inside of the loop; this will continue to grab the remaining screenshots even if one or more of the URLs are non-existent.
这篇关于Python Splinter(SeleniumHQ)如何拍摄许多网页的屏幕截图? [连接被拒绝]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!