Scrapy产生一个Request,在回调中解析,但使用原始函数中的信息 [英] Scrapy yield a Request, parse in the callback, but use the info in the original function
问题描述
因此,我试图对一些网页进行粗鲁的测试,我的想法是向满足条件的URL发出请求,计算页面上某些项目的数量,然后在原始条件内返回True / False取决于...
So I'm trying to test some webpages in scrapy, my idea is to yield a Request to the URLS that satisfy the condition, count the number of certain items on the page, and then within the original condition return True/False depending...
以下是一些代码来显示我的意思:
Here is some code to show what i mean:
def filter_categories:
if condition:
test = yield Request(url=link, callback = self.test_page, dont_filter=True)
return (test, None)
def test_page(self, link):
... parse the response...
return True/False depending
我尝试在请求中传递一个项目,但是无论在调用test_page之前触发了返回行是什么...
I have tried messing around with passing an item in the request, but no matter what the return line get's triggered before test_page is ever called...
所以我想我的问题是,有没有办法以同步方式将数据传递回filter_categories方法,以便我可以使用test_page的结果返回我的测试是否
So i guess my question becomes is there any way to pass data back to the filter_categories method in a synchronous way so that i can use the result of test_page to return whether or not my test is satisfied?
也欢迎其他任何想法。
推荐答案
加入看看 inline_requests 包,它应该可以帮助您实现这一目标。
Take a look at inline_requests package, which should let you achieve this.
其他解决方案是不坚持从原始方法返回结果(在您的情况下为 filter_categories
),而是使用请求链接具有请求的 meta
属性,并返回链中最后一个解析方法的结果(在您的情况下为 test_page
)。
Other solution is to not insist on returning the result from original method (filter_categories
in your case), but rather use request chaining with meta
attribute of requests and return the result from the last parse method in the chain (test_page
in your case).
这篇关于Scrapy产生一个Request,在回调中解析,但使用原始函数中的信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!