301 时 Scrapy 请求未传递给回调? [英] Scrapy request not passing to callback when 301?

查看:32
本文介绍了301 时 Scrapy 请求未传递给回调?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试更新一个充满外部网站链接的数据库,由于某种原因,当请求标头/网站/w/e 被移动/301 标志时,它会跳过回调

I'm trying to update a database full of links to external websites, for some reason, it's skipping the callback when the request headers/website/w/e is moved/301 flag

def start_requests(self): 

    #... database stuff

    for x in xrange(0, numrows):
        row = cur.fetchone()

        item = exampleItem()

        item['real_id'] = row[0]
        item['product_id'] = row[1]
        url = "http://www.example.com/a/-" + item['real_id'] + ".htm"
        log.msg("item %d request URL is %s" % (item['product_id'], url), log.INFO) # shows right
        request = scrapy.Request(url, callback=self.parse_url)
        request.meta['item'] = item
        yield request

def parse_url(self, response):
    item = response.meta['item']
    item['real_url'] = response.url
    log.msg("item %d new URL is %s" % (item['product_id'], item['real_url']), log.INFO) #doesnt even show the items that have redirected.

Scrapy 版本是 0.24,我该怎么办?

Scrapy version is 0.24, what can I do?

有趣的事实:它只发生在一些损坏的链接上,即使它们来自具有完全相同网址的同一个网站等.

Interesting fact: It only happens with some of the broken links, even if they are from the same website with the exact same urls, etc.

推荐答案

必须将 dont_filter=True 参数传递给 Response 回调函数

Had to pass the dont_filter=True parameter to the Response callback function

这篇关于301 时 Scrapy 请求未传递给回调?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆