为什么使用scrapy shell打印结果会出现这种不一致的行为? [英] Why this inconsistent behaviour using scrapy shell printing results?
本文介绍了为什么使用scrapy shell打印结果会出现这种不一致的行为?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
加载scrapy shell
Load the scrapy shell
scrapy shell "http://www.worldfootball.net/all_matches/eng-premier-league-2015-2016/"
尝试选择器:
response.xpath('(//table[@class="standard_tabelle"])[1]/tr[not(th)]')
注意:它会打印结果.
但现在将该选择器用作 for 语句:
But now use that selector as a for statement:
for row in response.xpath('(//table[@class="standard_tabelle"])[1]/tr[not(th)]'):
row.xpath(".//a[contains(@href, 'report')]/@href").extract_first()
点击回车两次,什么都不打印.要在 for 循环内打印结果,您必须将选择器包装在打印函数中.像这样:
Hit return twice, nothing is printed. To print results inside the for loop, you have to wrap the selector in a print function. Like so:
print(row.xpath(".//a[contains(@href, 'report')]/@href").extract_first())
为什么?
编辑
如果我和下面 Liam 的帖子完全一样,我的输出是这样的:
If I do the exact same thing as Liam's post below, my output is this:
rmp:www rmp$ scrapy shell "http://www.worldfootball.net/all_matches/eng-premier-league-2015-2016/"
2016-03-05 06:13:28 [scrapy] INFO: Scrapy 1.0.5 started (bot: scrapybot)
2016-03-05 06:13:28 [scrapy] INFO: Optional features available: ssl, http11
2016-03-05 06:13:28 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2016-03-05 06:13:28 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, CoreStats, SpiderState
2016-03-05 06:13:28 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2016-03-05 06:13:28 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2016-03-05 06:13:28 [scrapy] INFO: Enabled item pipelines:
2016-03-05 06:13:28 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-03-05 06:13:28 [scrapy] INFO: Spider opened
2016-03-05 06:13:29 [scrapy] DEBUG: Crawled (200) <GET http://www.worldfootball.net/all_matches/eng-premier-league-2015-2016/> (referer: None)
[s] Available Scrapy objects:
[s] crawler <scrapy.crawler.Crawler object at 0x108c89c10>
[s] item {}
[s] request <GET http://www.worldfootball.net/all_matches/eng-premier-league-2015-2016/>
[s] response <200 http://www.worldfootball.net/all_matches/eng-premier-league-2015-2016/>
[s] settings <scrapy.settings.Settings object at 0x10a25bb10>
[s] spider <DefaultSpider 'default' at 0x10c1201d0>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
2016-03-05 06:13:29 [root] DEBUG: Using default logger
2016-03-05 06:13:29 [root] DEBUG: Using default logger
In [1]: for row in response.xpath('(//table[@class="standard_tabelle"])[1]/tr[not(th)]'):
...: row.xpath(".//a[contains(@href, 'report')]/@href").extract_first()
...:
但是添加了打印?
In [2]: for row in response.xpath('(//table[@class="standard_tabelle"])[1]/tr[not(th)]'):
...: print row.xpath(".//a[contains(@href, 'report')]/@href").extract_first()
...:
/report/premier-league-2015-2016-manchester-united-tottenham-hotspur/
/report/premier-league-2015-2016-afc-bournemouth-aston-villa/
/report/premier-league-2015-2016-everton-fc-watford-fc/
/report/premier-league-2015-2016-leicester-city-sunderland-afc/
/report/premier-league-2015-2016-norwich-city-crystal-palace/
推荐答案
这对我有用.
>>>scrapy shell "http://www.worldfootball.net/all_matches/eng-premier-league-2015-2016/"
>>> for row in response.xpath('(//table[@class="standard_tabelle"])[1]/tr[not(th)]'):
... row.xpath(".//a[contains(@href, 'report')]/@href").extract_first()
...
u'/report/premier-league-2015-2016-manchester-united-tottenham-hotspur/'
u'/report/premier-league-2015-2016-afc-bournemouth-aston-villa/'
u'/report/premier-league-2015-2016-everton-fc-watford-fc/'
u'/report/premier-league-2015-2016-leicester-city-sunderland-afc/'
u'/report/premier-league-2015-2016-norwich-city-crystal-palace/'
u'/report/premier-league-2015-2016-chelsea-fc-swansea-city/'
u'/report/premier-league-2015-2016-arsenal-fc-west-ham-united/'
u'/report/premier-league-2015-2016-newcastle-united-southampton-fc/'
u'/report/premier-league-2015-2016-stoke-city-liverpool-fc/'
u'/report/premier-league-2015-2016-west-bromwich-albion-manchester-city/'
这不会为您显示相同的结果吗?
does this not show the same results for you?
这篇关于为什么使用scrapy shell打印结果会出现这种不一致的行为?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文