使用 Scrapy 从动态 JSON 响应中提取 [英] Extract from dynamic JSON response with Scrapy

查看:47
本文介绍了使用 Scrapy 从动态 JSON 响应中提取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从如下所示的 JSON 输出中提取avail"值.

I want to extract the 'avail' value from the JSON output that look like this.

{
    "result": {
        "code": 100,
        "message": "Command Successful"
    },
    "domains": {
        "yolotaxpayers.com": {
            "avail": false,
            "tld": "com",
            "price": "49.95",
            "premium": false,
            "backorder": true
        }
    }
}

问题是 ['avail'] 值在 ["domains"]["domain_name"] 下,我不知道如何获得域名.

The problem is that the ['avail'] value is under ["domains"]["domain_name"] and I can't figure out how to get the domain name.

下面有我的蜘蛛.第一部分工作正常,但不是第二部分.

You have my spider below. The first part works fine, but not the second one.

import scrapy
import json
from whois.items import WhoisItem

class whoislistSpider(scrapy.Spider):
    name = "whois_list"
    start_urls = []
    f = open('test.txt', 'r')
    global lines
    lines = f.read().splitlines()
    f.close()
    def __init__(self):
        for line in lines:
            self.start_urls.append('http://www.example.com/api/domain/check/%s/com' % line)

    def parse(self, response):
        for line in lines:
            jsonresponse = json.loads(response.body_as_unicode())
            item = WhoisItem()
            domain_name = list(jsonresponse['domains'].keys())[0]
            item["avail"] = jsonresponse["domains"][domain_name]["avail"]
            item["domain"] = domain_name
            yield item

预先感谢您的回复.

推荐答案

假设您只期望每个响应有一个结果:

Assuming you are only expecting one result per response:

domain_name = list(jsonresponse['domains'].keys())[0]
item["avail"] = jsonresponse["domains"][domain_name]["avail"]

即使文件test.txt"中的域与结果中的域不匹配,这也能正常工作.

This will work even if there is a mismatch between the domain in the file "test.txt" and the domain in the result.

这篇关于使用 Scrapy 从动态 JSON 响应中提取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆