Scrapy返回的结果比预期的多 [英] Scrapy returns more results than expected

查看:160
本文介绍了Scrapy返回的结果比预期的多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是问题的继续:从动态JSON提取用Scrapy回复

我有一个Scrapy蜘蛛,它从JSON响应中提取值.它运作良好,可以提取正确的值,但是却以某种方式进入循环并返回比预期更多的结果(重复的结果).

I have a Scrapy spider that extract values from a JSON response. It works well, extract the right values, but somehow it enters in a loop and returns more results than expected (duplicate results).

例如,对于test.txt文件中提供的17个值,它返回289结果,这意味着17 times more比预期的要好.

For example for 17 values provided in test.txt file it returns 289 results, that means 17 times more than expected.

下面的蜘蛛内容:

import scrapy
import json
from whois.items import WhoisItem

class whoislistSpider(scrapy.Spider):
    name = "whois_list"
    start_urls = []
    f = open('test.txt', 'r')
    global lines
    lines = f.read().splitlines()
    f.close()
    def __init__(self):
        for line in lines:
            self.start_urls.append('http://www.example.com/api/domain/check/%s/com' % line)

    def parse(self, response):
        for line in lines:
            jsonresponse = json.loads(response.body_as_unicode())
            item = WhoisItem()
            domain_name = list(jsonresponse['domains'].keys())[0]
            item["avail"] = jsonresponse["domains"][domain_name]["avail"]
            item["domain"] = domain_name
            yield item

items.py下面的内容

import scrapy

class WhoisItem(scrapy.Item):
    avail = scrapy.Field()
    domain = scrapy.Field()

pipelines.py以下

class WhoisPipeline(object):
    def process_item(self, item, spider):
        return item

预先感谢您的所有答复.

Thank you in advance for all the replies.

推荐答案

parse函数应如下所示:

def parse(self, response):
    jsonresponse = json.loads(response.body_as_unicode())
    item = WhoisItem()
    domain_name = list(jsonresponse['domains'].keys())[0]
    item["avail"] = jsonresponse["domains"][domain_name]["avail"]
    item["domain"] = domain_name
    yield item

请注意,我删除了for循环.

Notice that I removed the for loop.

发生了什么:对于每个响应,您都会循环并解析17次. (因此产生了17 * 17条记录)

What was happening: for every single response you would loop and parse it 17 times. (Therefore resulting in 17*17 records)

这篇关于Scrapy返回的结果比预期的多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆