测试Scrapy Spider仍然有效-查找页面更改 [英] Test scrapy spider still working - find page changes

查看:109
本文介绍了测试Scrapy Spider仍然有效-查找页面更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何针对在线数据测试抓痒的蜘蛛.

How can I test a scrapy spider against online data.

我现在从帖子中得知,可以针对 offline 数据.

I now from this post that it is possible to test a spider against offline data.

我的目标是检查我的Spider是否仍从页面中提取正确的数据,或者页面是否已更改.我通过XPath提取数据,有时页面会接收并更新,而我的抓取工具不再起作用.我希望测试尽可能接近我的代码,例如.使用Spider和scrapy设置,然后直接插入parse方法.

My target is to check if my spider still extracts the right data from a page, or if the page changed. I extract the data via XPath and sometimes the page receives and update and my scraper is no longer working. I would love to have the test as close to my code as possible, eg. using the spider and scrapy setup and just hook into the parse method.

推荐答案

参考您提供的链接,您可以尝试这种在线测试方法,该方法用于解决与您的问题类似的问题.您所要做的就是不要从文件中读取请求,而可以使用请求库为您获取实时网页,并根据您从以下请求中获得的响应来撰写抓抓的响应

Referring to the link you provided, you could try this method for online testing which I used for my problem which was similar to yours. All you have to do is instead of reading the requests from a file you can use the Requests library to fetch the live webpage for you and compose a scrapy response from the response you get from Requests like below

import os
import requests

from scrapy.http import Response, Request

def online_response_from_url (url=None):

    if not url:
        url = 'http://www.example.com'

    request = Request(url=url)


    oresp = requests.get(url)

    response = TextResponse(url=url, request=request,
    body=oresp.text, encoding = 'utf-8')

    return response

这篇关于测试Scrapy Spider仍然有效-查找页面更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆