崎Unit的单元测试 [英] Scrapy Unit Testing
问题描述
我想在Scrapy(屏幕抓取器/网络搜寻器)中实施一些单元测试.由于项目是通过"scrapy crawl"命令运行的,所以我可以通过诸如鼻子之类的东西来运行它.由于scrapy建立在扭曲之上,我可以使用其单元测试框架Trial吗?如果是这样,怎么办?否则,我想让鼻子工作.
I'd like to implement some unit tests in a Scrapy (screen scraper/web crawler). Since a project is run through the "scrapy crawl" command I can run it through something like nose. Since scrapy is built on top of twisted can I use its unit testing framework Trial? If so, how? Otherwise I'd like to get nose working.
更新:
我一直在谈论 Scrapy-Users ,我想我应该在测试代码中构建响应,然后使用响应调用该方法并断言[I]在输出中获得了预期的项目/请求".我似乎无法使它正常工作.
I've been talking on Scrapy-Users and I guess I am supposed to "build the Response in the test code, and then call the method with the response and assert that [I] get the expected items/requests in the output". I can't seem to get this to work though.
我可以在测试中构建单元测试类:
I can build a unit-test test class and in a test:
- 创建响应对象
- 尝试使用响应对象调用我的Spider的parse方法
但是最终它会生成此追溯.关于原因的任何见解?
However it ends up generating this traceback. Any insight as to why?
推荐答案
我做的方法是创建伪响应,这样您就可以脱机测试解析函数.但是您可以通过使用真实的HTML来了解实际情况.
The way I've done it is create fake responses, this way you can test the parse function offline. But you get the real situation by using real HTML.
此方法的问题是您的本地HTML文件可能无法反映在线的最新状态.因此,如果HTML在线更改,您可能会有一个大错误,但是您的测试用例仍会通过.因此,这可能不是测试这种方式的最佳方法.
A problem with this approach is that your local HTML file may not reflect the latest state online. So if the HTML changes online you may have a big bug, but your test cases will still pass. So it may not be the best way to test this way.
我当前的工作流程是,每当发生错误时,我都会使用url向管理员发送电子邮件.然后针对该特定错误,创建一个html文件,其中包含引起错误的内容.然后为它创建一个单元测试.
My current workflow is, whenever there is an error I will sent an email to admin, with the url. Then for that specific error I create a html file with the content which is causing the error. Then I create a unittest for it.
这是我用来创建示例Scrapy http响应以从本地html文件进行测试的代码:
This is the code I use to create sample Scrapy http responses for testing from an local html file:
# scrapyproject/tests/responses/__init__.py
import os
from scrapy.http import Response, Request
def fake_response_from_file(file_name, url=None):
"""
Create a Scrapy fake HTTP response from a HTML file
@param file_name: The relative filename from the responses directory,
but absolute paths are also accepted.
@param url: The URL of the response.
returns: A scrapy HTTP response which can be used for unittesting.
"""
if not url:
url = 'http://www.example.com'
request = Request(url=url)
if not file_name[0] == '/':
responses_dir = os.path.dirname(os.path.realpath(__file__))
file_path = os.path.join(responses_dir, file_name)
else:
file_path = file_name
file_content = open(file_path, 'r').read()
response = Response(url=url,
request=request,
body=file_content)
response.encoding = 'utf-8'
return response
示例html文件位于scrapyproject/tests/responses/osdir/sample.html
The sample html file is located in scrapyproject/tests/responses/osdir/sample.html
然后,测试用例可能如下所示: 测试用例的位置是scrapyproject/tests/test_osdir.py
Then the testcase could look as follows: The test case location is scrapyproject/tests/test_osdir.py
import unittest
from scrapyproject.spiders import osdir_spider
from responses import fake_response_from_file
class OsdirSpiderTest(unittest.TestCase):
def setUp(self):
self.spider = osdir_spider.DirectorySpider()
def _test_item_results(self, results, expected_length):
count = 0
permalinks = set()
for item in results:
self.assertIsNotNone(item['content'])
self.assertIsNotNone(item['title'])
self.assertEqual(count, expected_length)
def test_parse(self):
results = self.spider.parse(fake_response_from_file('osdir/sample.html'))
self._test_item_results(results, 10)
基本上,这就是我测试解析方法的方式,但不仅限于解析方法.如果变得更复杂,建议您查看 Mox
That's basically how I test my parsing methods, but its not only for parsing methods. If it gets more complex I suggest looking at Mox
这篇关于崎Unit的单元测试的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!