如何发送另一个请求并在scrapy解析函数中获得结果? [英] How to send another request and get result in scrapy parse function?

查看:137
本文介绍了如何发送另一个请求并在scrapy解析函数中获得结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在分析一个具有两级菜单的 HTML 页面.当顶级菜单更改时,会发送 AJAX 请求以获取二级菜单项.当顶部和第二个菜单都被选中时,刷新内容.

I'm analyzing an HTML page which has a two level menu. When the top-level menu changed, there's an AJAX request sent to get second-level menu item. When the top and second menu are both selected, then refresh the content.

我需要的是在scrapy的parse函数中发送另一个请求并获得子菜单响应.所以我可以迭代子菜单,为每个子菜单项构建 scrapy.Request.

What I need is sending another request and get the submenu response in the scrapy's parse function. So I can iterate the submenu, build scrapy.Request per submenu item.

伪代码如下:

def parse(self, response):
    top_level_menu = response.xpath('//TOP_LEVEL_MENU_XPATH')
    second_level_menu_items = ## HERE I NEED TO SEND A REQUEST AND GET RESULT, PARSED TO ITME VALUE LIST

    for second_menu_item in second_level_menu_items:
        yield scrapy.Request(response.urljoin(content_request_url + '?top_level=' + top_level_menu + '&second_level_menu=' + second_menu_item), callback=self.parse_content)

我该怎么做?

直接使用 requests 库?或者scrapy提供的其他一些功能?

Using requests lib directly? Or some other feature provided by scrapy?

推荐答案

这里推荐的方法是创建另一个回调(parse_second_level_menus?)来处理二级菜单项的响应并在那里, 创建对内容页面的请求.

The recommended approach here is to create another callback (parse_second_level_menus?) to handle the response for the second level menu items and in there, create the requests to the content pages.

此外,您可以使用 request.meta 属性在回调方法之间传递数据 (更多信息在这里).

Also, you can use the request.meta attribute to pass data between callback methods (more info here).

可能是这样的:

def parse(self, response):
    top_level_menu = response.xpath('//TOP_LEVEL_MENU_XPATH').get()
    yield scrapy.Request(
        some_url,
        callback=self.parse_second_level_menus,
        # pass the top_level_menu value to the other callback
        meta={'top_menu': top_level_menu},
    )

def parse_second_level_menus(self, response):
    # read the data passed in the meta by the first callback
    top_level_menu = response.meta.get('top_menu')
    second_level_menu_items = response.xpath('...').getall()

    for second_menu_item in second_level_menu_items:
        url = response.urljoin(content_request_url + '?top_level=' + top_level_menu + '&second_level_menu=' + second_menu_item)
        yield scrapy.Request(
            url,
            callback=self.parse_content
    )

def parse_content(self, response):
    ...

另一种方法(在这种情况下不太推荐)将使用这个库:https://github.com/rmax/scrapy-inline-requests

Yet another approach (less recommended in this case) would be using this library: https://github.com/rmax/scrapy-inline-requests

这篇关于如何发送另一个请求并在scrapy解析函数中获得结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆