python - scrapy爬虫不能循环运行？

查看：302 发布时间：2017/9/6 0:17:27 python scrapy

本文介绍了python - scrapy爬虫不能循环运行？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

问题

scrapy只能爬取一个页面上的链接，不能持续运行爬完全站，以下是代码，初学求指导。

class DbbookSpider(scrapy.Spider):
    name = "imufe"
    allowed_domains = ['http://www.imufe.edu.cn/']
    start_urls=('http://www.imufe.edu.cn/main/dtxw/201704/t20170414_127035.html')
    def parse(self, response):
        item = DoubanbookItem()
        selector = scrapy.Selector(response)
        print(selector)
        books = selector.xpath('//a/@href').extract()
        link=[]
        for each in books:
            each=urljoin(response.url,each)
            link.append(each)
        for each in link:  
            item['link'] = each
            yield item
        i = random.randint(0,len(link)-1)
        nextPage = link[i]
        yield scrapy.http.Request(nextPage,callback=self.parse)

解决方案

感谢大家的回答，16日发的问题，10天后回答多了起来，经过反复查找，我的代码allowed_domains = ['http://www.imufe.edu.cn/']这句出现错误，allowed_domains中的网站地址必须去掉'http://'后爬虫方可以正常运行，再次感谢各位回答者！

这篇关于python - scrapy爬虫不能循环运行？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

python - scrapy爬虫不能循环运行？

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

python - scrapy爬虫不能循环运行？

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭