Scrapy产生一个Request,在回调中解析,但使用原始函数中的信息 [英] Scrapy yield a Request, parse in the callback, but use the info in the original function

查看:144
本文介绍了Scrapy产生一个Request,在回调中解析,但使用原始函数中的信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我试图对一些网页进行粗鲁的测试,我的想法是向满足条件的URL发出请求,计算页面上某些项目的数量,然后在原始条件内返回True / False取决于...

So I'm trying to test some webpages in scrapy, my idea is to yield a Request to the URLS that satisfy the condition, count the number of certain items on the page, and then within the original condition return True/False depending...

以下是一些代码来显示我的意思:

Here is some code to show what i mean:

def filter_categories:
    if condition:
        test = yield Request(url=link, callback = self.test_page, dont_filter=True)
        return (test, None)

def test_page(self, link):
    ... parse the response...
    return True/False depending

我尝试在请求中传递一个项目,但是无论在调用test_page之前触发了返回行是什么...

I have tried messing around with passing an item in the request, but no matter what the return line get's triggered before test_page is ever called...

所以我想我的问题是,有没有办法以同步方式将数据传递回filter_categories方法,以便我可以使用test_page的结果返回我的测试是否

So i guess my question becomes is there any way to pass data back to the filter_categories method in a synchronous way so that i can use the result of test_page to return whether or not my test is satisfied?

也欢迎其他任何想法。

推荐答案

加入看看 inline_requests 包,它应该可以帮助您实现这一目标。

Take a look at inline_requests package, which should let you achieve this.

其他解决方案是不坚持从原始方法返回结果(在您的情况下为 filter_categories ),而是使用请求链接具有请求的 meta 属性,并返回链中最后一个解析方法的结果(在您的情况下为 test_page )。

Other solution is to not insist on returning the result from original method (filter_categories in your case), but rather use request chaining with meta attribute of requests and return the result from the last parse method in the chain (test_page in your case).

这篇关于Scrapy产生一个Request,在回调中解析,但使用原始函数中的信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆