如何发送另一个请求并在scrapy解析函数中获得结果? [英] How to send another request and get result in scrapy parse function?
问题描述
我正在分析一个具有两级菜单的 HTML 页面.当顶级菜单更改时,会发送 AJAX 请求以获取二级菜单项.当顶部和第二个菜单都被选中时,刷新内容.
I'm analyzing an HTML page which has a two level menu. When the top-level menu changed, there's an AJAX request sent to get second-level menu item. When the top and second menu are both selected, then refresh the content.
我需要的是在scrapy的parse
函数中发送另一个请求并获得子菜单响应.所以我可以迭代子菜单,为每个子菜单项构建 scrapy.Request
.
What I need is sending another request and get the submenu response in the scrapy's parse
function. So I can iterate the submenu, build scrapy.Request
per submenu item.
伪代码如下:
def parse(self, response):
top_level_menu = response.xpath('//TOP_LEVEL_MENU_XPATH')
second_level_menu_items = ## HERE I NEED TO SEND A REQUEST AND GET RESULT, PARSED TO ITME VALUE LIST
for second_menu_item in second_level_menu_items:
yield scrapy.Request(response.urljoin(content_request_url + '?top_level=' + top_level_menu + '&second_level_menu=' + second_menu_item), callback=self.parse_content)
我该怎么做?
直接使用 requests
库?或者scrapy提供的其他一些功能?
Using requests
lib directly? Or some other feature provided by scrapy?
推荐答案
这里推荐的方法是创建另一个回调(parse_second_level_menus
?)来处理二级菜单项的响应并在那里, 创建对内容页面的请求.
The recommended approach here is to create another callback (parse_second_level_menus
?) to handle the response for the second level menu items and in there, create the requests to the content pages.
此外,您可以使用 request.meta
属性在回调方法之间传递数据 (更多信息在这里).
Also, you can use the request.meta
attribute to pass data between callback methods (more info here).
可能是这样的:
def parse(self, response):
top_level_menu = response.xpath('//TOP_LEVEL_MENU_XPATH').get()
yield scrapy.Request(
some_url,
callback=self.parse_second_level_menus,
# pass the top_level_menu value to the other callback
meta={'top_menu': top_level_menu},
)
def parse_second_level_menus(self, response):
# read the data passed in the meta by the first callback
top_level_menu = response.meta.get('top_menu')
second_level_menu_items = response.xpath('...').getall()
for second_menu_item in second_level_menu_items:
url = response.urljoin(content_request_url + '?top_level=' + top_level_menu + '&second_level_menu=' + second_menu_item)
yield scrapy.Request(
url,
callback=self.parse_content
)
def parse_content(self, response):
...
另一种方法(在这种情况下不太推荐)将使用这个库:https://github.com/rmax/scrapy-inline-requests
Yet another approach (less recommended in this case) would be using this library: https://github.com/rmax/scrapy-inline-requests
这篇关于如何发送另一个请求并在scrapy解析函数中获得结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!