Scrapy shell 错误 [英] Scrapy shell Error

查看:52
本文介绍了Scrapy shell 错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Scrapy 的新手,正在学习教程.运行此命令并出现错误.

C:\Users\Sandra\Anaconda>scrapy shell 'http://scrapy.org'

特别是这是什么 URLError: <urlopen error [Errno 10051] 尝试对无法访问的网络进行套接字操作>

完整的错误信息:

2015-08-20 23:35:08 [scrapy] INFO:Scrapy 1.0.3 启动(机器人:scrapybot)2015-08-20 23:35:08 [scrapy] 信息:可用的可选功能:ssl、http11、boto2015-08-20 23:35:08 [scrapy] 信息:覆盖设置:{'LOGSTATS_INTERVAL': 0}2015-08-20 23:35:10 [scrapy] 信息:启用扩展:CloseSpider、TelnetConsole、CoreStats、SpiderState2015-08-20 23:35:10 [boto] 调试:从元数据服务器检索凭据.2015-08-20 23:35:10 [boto] 错误:捕获异常读取实例数据回溯(最近一次调用最后一次):文件C:\Users\Sandra\Anaconda\lib\site-packages\boto\utils.py",第 210 行,在 retry_url 中r = opener.open(req, timeout=timeout)文件C:\Users\Sandra\Anaconda\lib\urllib2.py",第 431 行,打开response = self._open(req, data)文件C:\Users\Sandra\Anaconda\lib\urllib2.py",第 449 行,在 _open'_open',请求)_call_chain 中的文件C:\Users\Sandra\Anaconda\lib\urllib2.py",第 409 行结果 = func(*args)文件C:\Users\Sandra\Anaconda\lib\urllib2.py",第 1227 行,在 http_open返回 self.do_open(httplib.HTTPConnection, req)文件C:\Users\Sandra\Anaconda\lib\urllib2.py",第 1197 行,在 do_open引发 URLError(err)URLError: <urlopen error [Errno 10051] 试图对无法访问的网络进行套接字操作>2015-08-20 23:35:10 [boto] 错误:无法读取实例数据,放弃2015-08-20 23:35:10 [scrapy] 信息:启用下载器中间件:HttpAuthMiddleware、DownloadTimeoutMiddleware、UserAgentMiddleware、RetryMiddleware、DefaultHeadersMiddleware、MetaRefreshMiddleware、HttpCompressionMiddleware、RedirectMiddleware、CookiesMiddleware、ChunkedTransferMiddleware、DownloaderStats2015-08-20 23:35:10 [scrapy] 信息:启用蜘蛛中间件:HttpErrorMiddleware、OffsiteMiddleware、RefererMiddleware、UrlLengthM中间件,深度中间件2015-08-20 23:35:10 [scrapy] 信息:启用项目管道:2015-08-20 23:35:10 [scrapy] 调试:Telnet 控制台监听 127.0.0.1:6023回溯(最近一次调用最后一次):文件C:\Users\Sandra\Anaconda\Scripts\scrapy-script.py",第 5 行,在 <module> 中sys.exit(执行())文件C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\cmdline.py",第 143 行,在执行中_run_print_help(解析器,_run_command,cmd,args,opts)文件C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\cmdline.py",第 89 行,在 _run_print_helpfunc(*a, **kw)文件C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\cmdline.py",第 150 行,在 _run_commandcmd.run(args, opts)文件C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\commands\shell.py",第 63 行,运行中shell.start(url=url)文件C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\shell.py",第 44 行,在开始self.fetch(网址,蜘蛛)文件C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\shell.py",第 81 行,在 fetch 中url = any_to_uri(request_or_url)文件C:\Users\Sandra\Anaconda\lib\site-packages\w3lib\url.py",第 232 行,在 any_to_uri如果 u.scheme else path_to_file_uri(uri_or_path),则返回 uri_or_path文件C:\Users\Sandra\Anaconda\lib\site-packages\w3lib\url.py",第 213 行,在 path_to_file_urix = move.urllib.request.pathname2url(os.path.abspath(path))文件C:\Users\Sandra\Anaconda\lib\nturl2path.py",第 58 行,在 pathname2url引发 IOError,错误错误:错误路径:C:\Users\Sandra\Anaconda\'http:\scrapy.org'

这里是安装的软件包列表:# C:\Users\Sandra\Anaconda 环境中的软件包:#_license 1.1 py27_0
雪花石膏 0.7.3 py27_0
蟒蛇 2.3.0 np19py27_0
argcomplete 0.8.9 py27_0
astropy 1.0.3 np19py27_0
巴别塔 1.3 py27_0
backports.ssl 匹配主机名 3.4.0.2bcolz 0.9.0 np19py27_0
美汤 4.3.2 py27_1
Beautifulsoup4 4.3.2binstar 0.11.0 py27_0
位数组 0.8.1 py27_1
火焰 0.8.0火焰核心 0.8.0 np19py27_0
blz 0.6.2 np19py27_1
散景 0.9.0 np19py27_0
boto 2.38.0 py27_0
瓶颈 1.0.0 np19py27_0
cdecimal 2.3 py27_1
证书 14.05.14 py27_0
cffi 1.1.2 py27_0
特性 14.3.0客户端 0.3.4 py27_0
颜色 0.3.3 py27_0
康达 3.16.0 py27_0
康达构建 1.14.0 py27_0
康达环境 2.4.2 py27_0
configobj 5.0.6 py27_0
crcmod 1.7密码学 0.9.3 py27_0
cssselect 0.9.1 py27_0
cython 0.22.1 py27_0
cytoolz 0.7.3 py27_0
数据形状 0.4.5 np19py27_0
装饰器 3.4.2 py27_0
医生 0.6.2docutils 0.12 py27_1
dynd-python 0.6.5 np19py27_0
enum34 1.0.4 py27_0
快速缓存 1.0.2 py27_0
filechunkio 1.6烧瓶 0.10.1 py27_1
funcsigs 0.4 py27_0
期货 3.0.2 py27_0
gcs-oauth2-boto-plugin 1.9gevent 1.0.1 py27_0
gevent-websocket 0.9.3 py27_0
谷歌 API 蟒客户端 1.4.0谷歌apitools 0.4.3greenlet 0.4.7 py27_0
咧嘴笑 1.2.1 py27_2
gsutil 4.12h5py 2.5.0 np19py27_1
HDF5 1.8.15.1 2
httplib2 0.9.1idna 2.0 py27_0
ipaddress 1.0.7 py27_0
ipython 3.2.0 py27_0
ipython-笔记本 3.2.0 py27_0
ipython-qtconsole 3.2.0 py27_0
它的危险 0.24 py27_0
jdcal 1.0 py27_0
绝地 0.8.1 py27_0
jinja2 2.7.3 py27_2
jsonschema 2.4.0 py27_0
启动器 1.0.0 1
llvmlite 0.5.0 py27_0
lxml 3.4.4 py27_0
标记安全 0.23 py27_0
matplotlib 1.4.3 np19py27_1
menuinst 1.0.4 py27_0
误调 0.5.1 py27_1
模拟 1.0.1 py27_0
工作 0.4.4多分派 0.4.7 py27_0
networkx 1.9.1 py27_0
nltk 3.0.3 np19py27_0
节点 webkit 0.10.1 0
鼻子 1.3.7 py27_0
numba 0.19.1 np19py27_0
numexpr 2.4.3 np19py27_0
numpy 1.9.2 py27_0
oauth2client 1.4.7odo 0.3.2 np19py27_0
openpyxl 1.8.5 py27_0
熊猫 0.16.2 np19py27_0
patsy 0.3.0 np19py27_0
模式 2.6铅 0.110pep8 1.6.2 py27_0
枕头 2.8.2 py27_0
点子 7.1.0 py27_1
层 3.6 py27_0
原型 0.10.0psutil 2.2.1 py27_0
py 1.4.27 py27_0
pyasn1 0.1.7 py27_0
pyasn1-modules 0.0.5pycosat 0.6.1 py27_0
pycparser 2.14 py27_0
pycrypto 2.6.1 py27_3
pyflakes 0.9.2 py27_0
pygments 2.0.2 py27_0
pyopenssl 0.15.1 py27_1
pyparsing 2.0.3 py27_0
pyqt 4.10.4 py27_1
pyreadline 2.0 py27_0
pytables 3.2.0 np19py27_0
pytest 2.7.1 py27_0
蟒蛇 2.7.9 1
python-dateutil 2.4.2 py27_0
python-gflags 2.0pytz 2015.4 py27_0
pywin32 219 py27_0
pyyaml 3.11 py27_1
pyzmq 14.7.0 py27_0
队列库 1.2.2 py27_0
请求 2.7.0 py27_0
重试装饰器 1.0.0牛仔竞技表演 0.2.3绳索 0.9.4 py27_1
RSA 3.1.4runipy 0.1.3 py27_0
scikit 图像 0.11.3 np19py27_0
scikit-learn 0.16.1 np19py27_0
scipy 0.15.1 np19py27_0
刮刮 1.0.3seaborn 0.5.1 np19py27_0
服务标识 14.0.0设置工具 18.1 py27_0
简单的 3.6.5六个 1.9.0 py27_0
Snowballstemmer 1.2.0 py27_0
sockjs-龙卷风 1.0.1 py27_0
袜子分支 1.1狮身人面像 1.3.1 py27_0
狮身人面像 RTD 主题 0.1.7sphinx_rtd_theme 0.1.7 py27_0
间谍 2.3.5.2 py27_0
间谍应用程序 2.3.5.2 py27_0
sqlalchemy 1.0.5 py27_0
ssl_match_hostname 3.4.0.2 py27_0
统计模型 0.6.1 np19py27_0
同情 0.7.6 py27_0
表 3.2.0工具 0.7.2 py27_0
龙卷风 4.2 py27_0
扭曲的 15.3.0 py27_0
ujson 1.33 py27_0
unicodecsv 0.9.4 py27_0
模板 0.6w3lib 1.12.0 py27_0
werkzeug 0.10.4 py27_0
车轮 0.24.0 py27_0
xlrd 0.9.3 py27_0
xlsxwriter 0.7.3 py27_0
xlwings 0.3.5 py27_0
xlwt 1.0.0 py27_0
zlib 1.2.8 0
zope.interface 4.1.2 py27_1

解决方案

该特定错误消息由 boto (boto 2.38.0 py27_0) 生成,用于连接到 Amazon S3.默认情况下,Scrapy 没有启用此功能.

如果您只是在学习本教程,并且除了被指示执行的操作之外还没有执行任何其他操作,那么这可能是配置问题.使用命令中的 shell 参数启动 Scrapy 仍将使用配置和相关的设置文件.默认情况下,Scrapy 会查找:

<块引用>

  1. /etc/scrapy.cfgc:\scrapy\scrapy.cfg(系统范围),
  2. ~/.config/scrapy.cfg ($XDG_CONFIG_HOME) 和 ~/.scrapy.cfg ($HOME) 用于全局(用户范围)设置,以及
  3. scrapy.cfg 在scrapy 项目的根目录中(见下一节).

在回复评论时,当 boto 存在时,这似乎是 Scrapy 的一个错误 (此处有错误).

响应如何禁用下载处理程序",将以下内容添加到您的 settings.py 文件中:

DOWNLOAD_HANDLERS : {'s3':无,}

你的 settings.py 文件应该在你的 Scrapy 项目文件夹的根目录中,(比你的 scrapy.cfg 文件深一层).

如果您的 settings.py 文件中已经有 DOWNLOAD_HANDLERS,只需为s3"添加一个值为 None 的新条目.

编辑 2:我强烈建议您考虑为您的项目设置虚拟环境.查看virtualenv,它的用法.无论用于该项目的软件包如何,我都会提出此建议,但对于您的极端数量的软件包,则加倍如此.

I am a newbie to Scrapy and going through the tutorials. Ran this command and got some error.

C:\Users\Sandra\Anaconda>scrapy shell 'http://scrapy.org'

In particular what is this URLError: <urlopen error [Errno 10051] A socket operation was attempted to an unreachable network>

Full Error message:

2015-08-20 23:35:08 [scrapy] INFO: Scrapy 1.0.3 started (bot: scrapybot)
2015-08-20 23:35:08 [scrapy] INFO: Optional features available: ssl, http11, boto
2015-08-20 23:35:08 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2015-08-20 23:35:10 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, CoreStats, SpiderState
2015-08-20 23:35:10 [boto] DEBUG: Retrieving credentials from metadata server.
2015-08-20 23:35:10 [boto] ERROR: Caught exception reading instance data
Traceback (most recent call last):
File "C:\Users\Sandra\Anaconda\lib\site-packages\boto\utils.py", line 210, in retry_url
r = opener.open(req, timeout=timeout)
File "C:\Users\Sandra\Anaconda\lib\urllib2.py", line 431, in open
response = self._open(req, data)
File "C:\Users\Sandra\Anaconda\lib\urllib2.py", line 449, in _open
'_open', req)
File "C:\Users\Sandra\Anaconda\lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "C:\Users\Sandra\Anaconda\lib\urllib2.py", line 1227, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\Users\Sandra\Anaconda\lib\urllib2.py", line 1197, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 10051] A socket operation was attempted to an unreachable network>
2015-08-20 23:35:10 [boto] ERROR: Unable to read instance data, giving up
2015-08-20 23:35:10 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddlewar
e, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddlewar
e, ChunkedTransferMiddleware, DownloaderStats
2015-08-20 23:35:10 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthM
iddleware, DepthMiddleware
2015-08-20 23:35:10 [scrapy] INFO: Enabled item pipelines:
2015-08-20 23:35:10 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
Traceback (most recent call last):
 File "C:\Users\Sandra\Anaconda\Scripts\scrapy-script.py", line 5, in <module>
sys.exit(execute())
File "C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\commands\shell.py", line 63, in run
shell.start(url=url)
File "C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\shell.py", line 44, in start
self.fetch(url, spider)
File "C:\Users\Sandra\Anaconda\lib\site-packages\scrapy\shell.py", line 81, in fetch
url = any_to_uri(request_or_url)
File "C:\Users\Sandra\Anaconda\lib\site-packages\w3lib\url.py", line 232, in any_to_uri
return uri_or_path if u.scheme else path_to_file_uri(uri_or_path)
File "C:\Users\Sandra\Anaconda\lib\site-packages\w3lib\url.py", line 213, in path_to_file_uri
x = moves.urllib.request.pathname2url(os.path.abspath(path))
File "C:\Users\Sandra\Anaconda\lib\nturl2path.py", line 58, in pathname2url
raise IOError, error
Error: Bad path: C:\Users\Sandra\Anaconda\'http:\scrapy.org'

Here is list of packages installed: # packages in environment at C:\Users\Sandra\Anaconda: # _license 1.1 py27_0
alabaster 0.7.3 py27_0
anaconda 2.3.0 np19py27_0
argcomplete 0.8.9 py27_0
astropy 1.0.3 np19py27_0
babel 1.3 py27_0
backports.ssl-match-hostname 3.4.0.2 bcolz 0.9.0 np19py27_0
beautiful-soup 4.3.2 py27_1
beautifulsoup4 4.3.2 binstar 0.11.0 py27_0
bitarray 0.8.1 py27_1
blaze 0.8.0 blaze-core 0.8.0 np19py27_0
blz 0.6.2 np19py27_1
bokeh 0.9.0 np19py27_0
boto 2.38.0 py27_0
bottleneck 1.0.0 np19py27_0
cdecimal 2.3 py27_1
certifi 14.05.14 py27_0
cffi 1.1.2 py27_0
characteristic 14.3.0 clyent 0.3.4 py27_0
colorama 0.3.3 py27_0
conda 3.16.0 py27_0
conda-build 1.14.0 py27_0
conda-env 2.4.2 py27_0
configobj 5.0.6 py27_0
crcmod 1.7 cryptography 0.9.3 py27_0
cssselect 0.9.1 py27_0
cython 0.22.1 py27_0
cytoolz 0.7.3 py27_0
datashape 0.4.5 np19py27_0
decorator 3.4.2 py27_0
docopt 0.6.2 docutils 0.12 py27_1
dynd-python 0.6.5 np19py27_0
enum34 1.0.4 py27_0
fastcache 1.0.2 py27_0
filechunkio 1.6 flask 0.10.1 py27_1
funcsigs 0.4 py27_0
futures 3.0.2 py27_0
gcs-oauth2-boto-plugin 1.9 gevent 1.0.1 py27_0
gevent-websocket 0.9.3 py27_0
google-api-python-client 1.4.0 google-apitools 0.4.3 greenlet 0.4.7 py27_0
grin 1.2.1 py27_2
gsutil 4.12 h5py 2.5.0 np19py27_1
hdf5 1.8.15.1 2
httplib2 0.9.1 idna 2.0 py27_0
ipaddress 1.0.7 py27_0
ipython 3.2.0 py27_0
ipython-notebook 3.2.0 py27_0
ipython-qtconsole 3.2.0 py27_0
itsdangerous 0.24 py27_0
jdcal 1.0 py27_0
jedi 0.8.1 py27_0
jinja2 2.7.3 py27_2
jsonschema 2.4.0 py27_0
launcher 1.0.0 1
llvmlite 0.5.0 py27_0
lxml 3.4.4 py27_0
markupsafe 0.23 py27_0
matplotlib 1.4.3 np19py27_1
menuinst 1.0.4 py27_0
mistune 0.5.1 py27_1
mock 1.0.1 py27_0
mrjob 0.4.4 multipledispatch 0.4.7 py27_0
networkx 1.9.1 py27_0
nltk 3.0.3 np19py27_0
node-webkit 0.10.1 0
nose 1.3.7 py27_0
numba 0.19.1 np19py27_0
numexpr 2.4.3 np19py27_0
numpy 1.9.2 py27_0
oauth2client 1.4.7 odo 0.3.2 np19py27_0
openpyxl 1.8.5 py27_0
pandas 0.16.2 np19py27_0
patsy 0.3.0 np19py27_0
pattern 2.6 pbs 0.110 pep8 1.6.2 py27_0
pillow 2.8.2 py27_0
pip 7.1.0 py27_1
ply 3.6 py27_0
protorpc 0.10.0 psutil 2.2.1 py27_0
py 1.4.27 py27_0
pyasn1 0.1.7 py27_0
pyasn1-modules 0.0.5 pycosat 0.6.1 py27_0
pycparser 2.14 py27_0
pycrypto 2.6.1 py27_3
pyflakes 0.9.2 py27_0
pygments 2.0.2 py27_0
pyopenssl 0.15.1 py27_1
pyparsing 2.0.3 py27_0
pyqt 4.10.4 py27_1
pyreadline 2.0 py27_0
pytables 3.2.0 np19py27_0
pytest 2.7.1 py27_0
python 2.7.9 1
python-dateutil 2.4.2 py27_0
python-gflags 2.0 pytz 2015.4 py27_0
pywin32 219 py27_0
pyyaml 3.11 py27_1
pyzmq 14.7.0 py27_0
queuelib 1.2.2 py27_0
requests 2.7.0 py27_0
retry-decorator 1.0.0 rodeo 0.2.3 rope 0.9.4 py27_1
rsa 3.1.4 runipy 0.1.3 py27_0
scikit-image 0.11.3 np19py27_0
scikit-learn 0.16.1 np19py27_0
scipy 0.15.1 np19py27_0
scrapy 1.0.3 seaborn 0.5.1 np19py27_0
service-identity 14.0.0 setuptools 18.1 py27_0
simplejson 3.6.5 six 1.9.0 py27_0
snowballstemmer 1.2.0 py27_0
sockjs-tornado 1.0.1 py27_0
socksipy-branch 1.1 sphinx 1.3.1 py27_0
sphinx-rtd-theme 0.1.7 sphinx_rtd_theme 0.1.7 py27_0
spyder 2.3.5.2 py27_0
spyder-app 2.3.5.2 py27_0
sqlalchemy 1.0.5 py27_0
ssl_match_hostname 3.4.0.2 py27_0
statsmodels 0.6.1 np19py27_0
sympy 0.7.6 py27_0
tables 3.2.0 toolz 0.7.2 py27_0
tornado 4.2 py27_0
twisted 15.3.0 py27_0
ujson 1.33 py27_0
unicodecsv 0.9.4 py27_0
uritemplate 0.6 w3lib 1.12.0 py27_0
werkzeug 0.10.4 py27_0
wheel 0.24.0 py27_0
xlrd 0.9.3 py27_0
xlsxwriter 0.7.3 py27_0
xlwings 0.3.5 py27_0
xlwt 1.0.0 py27_0
zlib 1.2.8 0
zope.interface 4.1.2 py27_1

解决方案

That particular error message is being generated by boto (boto 2.38.0 py27_0), which is used to connect to Amazon S3. Scrapy doesn't have this enabled by default.

If you're just going through the tutorial, and haven't done anything other than what you've been instructed to do, then it could be a configuration problem. Launching Scrapy with the shell argument from the command will still use the configuration and the associated settings file. By default, Scrapy will look in:

  1. /etc/scrapy.cfg or c:\scrapy\scrapy.cfg (system-wide),
  2. ~/.config/scrapy.cfg ($XDG_CONFIG_HOME) and ~/.scrapy.cfg ($HOME) for global (user-wide) settings, and
  3. scrapy.cfg inside a scrapy project’s root (see next section).

EDIT: In reply to the comments, this appears to be a bug with Scrapy when boto is present (bug here).

In response "how to disable the Download handler", add the following to your settings.py file:

DOWNLOAD_HANDLERS : {
    's3': None,
}

Your settings.py file should be in the root of your Scrapy project folder, (one level deeper than your scrapy.cfg file).

If you've already got DOWNLOAD_HANDLERS in your settings.py file, just add a new entry for 's3' with a None value.

EDIT 2: I'd highly recommend looking at setting up virtual environments for your projects. Look into virtualenv, and it's usage. I'd make this recommendation regardless of packages used for this project, but doubly so with your extreme number of packages.

这篇关于Scrapy shell 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆