测试Scrapy Spider仍然有效-查找页面更改 [英] Test scrapy spider still working - find page changes
问题描述
如何针对在线数据测试抓痒的蜘蛛.
How can I test a scrapy spider against online data.
我现在从此帖子中得知,可以针对 offline 数据.
I now from this post that it is possible to test a spider against offline data.
我的目标是检查我的Spider是否仍从页面中提取正确的数据,或者页面是否已更改.我通过XPath提取数据,有时页面会接收并更新,而我的抓取工具不再起作用.我希望测试尽可能接近我的代码,例如.使用Spider和scrapy设置,然后直接插入parse方法.
My target is to check if my spider still extracts the right data from a page, or if the page changed. I extract the data via XPath and sometimes the page receives and update and my scraper is no longer working. I would love to have the test as close to my code as possible, eg. using the spider and scrapy setup and just hook into the parse method.
推荐答案
参考您提供的链接,您可以尝试这种在线测试方法,该方法用于解决与您的问题类似的问题.您所要做的就是不要从文件中读取请求,而可以使用请求库为您获取实时网页,并根据您从以下请求中获得的响应来撰写抓抓的响应
Referring to the link you provided, you could try this method for online testing which I used for my problem which was similar to yours. All you have to do is instead of reading the requests from a file you can use the Requests library to fetch the live webpage for you and compose a scrapy response from the response you get from Requests like below
import os
import requests
from scrapy.http import Response, Request
def online_response_from_url (url=None):
if not url:
url = 'http://www.example.com'
request = Request(url=url)
oresp = requests.get(url)
response = TextResponse(url=url, request=request,
body=oresp.text, encoding = 'utf-8')
return response
这篇关于测试Scrapy Spider仍然有效-查找页面更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!