Scrapy返回的结果比预期的多 [英] Scrapy returns more results than expected
问题描述
这是问题的继续:从动态JSON提取用Scrapy回复
我有一个Scrapy蜘蛛,它从JSON响应中提取值.它运作良好,可以提取正确的值,但是却以某种方式进入循环并返回比预期更多的结果(重复的结果).
I have a Scrapy spider that extract values from a JSON response. It works well, extract the right values, but somehow it enters in a loop and returns more results than expected (duplicate results).
例如,对于test.txt
文件中提供的17个值,它返回289
结果,这意味着17 times more
比预期的要好.
For example for 17 values provided in test.txt
file it returns 289
results, that means 17 times more
than expected.
下面的蜘蛛内容:
import scrapy
import json
from whois.items import WhoisItem
class whoislistSpider(scrapy.Spider):
name = "whois_list"
start_urls = []
f = open('test.txt', 'r')
global lines
lines = f.read().splitlines()
f.close()
def __init__(self):
for line in lines:
self.start_urls.append('http://www.example.com/api/domain/check/%s/com' % line)
def parse(self, response):
for line in lines:
jsonresponse = json.loads(response.body_as_unicode())
item = WhoisItem()
domain_name = list(jsonresponse['domains'].keys())[0]
item["avail"] = jsonresponse["domains"][domain_name]["avail"]
item["domain"] = domain_name
yield item
items.py下面的内容
import scrapy
class WhoisItem(scrapy.Item):
avail = scrapy.Field()
domain = scrapy.Field()
pipelines.py以下
class WhoisPipeline(object):
def process_item(self, item, spider):
return item
预先感谢您的所有答复.
Thank you in advance for all the replies.
推荐答案
parse
函数应如下所示:
def parse(self, response):
jsonresponse = json.loads(response.body_as_unicode())
item = WhoisItem()
domain_name = list(jsonresponse['domains'].keys())[0]
item["avail"] = jsonresponse["domains"][domain_name]["avail"]
item["domain"] = domain_name
yield item
请注意,我删除了for
循环.
Notice that I removed the for
loop.
发生了什么:对于每个响应,您都会循环并解析17次. (因此产生了17 * 17条记录)
What was happening: for every single response you would loop and parse it 17 times. (Therefore resulting in 17*17 records)
这篇关于Scrapy返回的结果比预期的多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!