Python解码JSON中的嵌套JSON [英] Python decode nested JSON in JSON

查看:111
本文介绍了Python解码JSON中的嵌套JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个API,该API不幸返回的是格式错误(或奇怪的格式",而不是-感谢@fjarri)JSON,但从积极的方面来说,我认为这可能是我学习递归的机会以及JSON.这是用于我用来记录锻炼情况的应用程序,我正在尝试制作备份脚本.

I'm dealing with an API that unfortunately is returning malformed (or "weirdly formed," rather -- thanks @fjarri) JSON, but on the positive side I think it may be an opportunity for me to learn something about recursion as well as JSON. It's for an app I use to log my workouts, I'm trying to make a backup script.

我可以很好地接收JSON,但是即使在requests.get(api_url).json()(或json.loads(requests.get(api_url).text))之后,值之一仍然是JSON编码的字符串.幸运的是,我只需将字符串json.loads()正确地解码为字典即可.特定密钥是可预测的:timezone_id,而其值会有所不同(因为数据已记录在多个时区中).例如,在 解码之后,它可能是:dump ed为"timezone_id": {\"name\":\"America/Denver\",\"seconds\":\"-21600\"}"文件,或者load ed为Python 'timezone_id': '{"name":"America/Denver","seconds":"-21600"}'

I can received the JSON fine, but even after requests.get(api_url).json() (or json.loads(requests.get(api_url).text)), one of the values is still a JSON encoded string. Luckily, I can just json.loads() the string and it properly decodes to a dict. The specific key is predictable: timezone_id, whereas its value varies (because data has been logged in multiple timezones). For example, after decoding, it might be: dumped to file as "timezone_id": {\"name\":\"America/Denver\",\"seconds\":\"-21600\"}", or loaded into Python as 'timezone_id': '{"name":"America/Denver","seconds":"-21600"}'

问题是我正在使用此API来检索相当数量的数据,其中包含多层词典和列表,并且双重编码的timezone_id出现在多个级别.

The problem is that I'm using this API to retrieve a fair amount of data, which has several layers of dicts and lists, and the double encoded timezone_ids occur at multiple levels.

到目前为止,这是我的工作,其中包含一些示例数据,但看来我还差得远.

Here's my work so far with some example data, but it seems like I'm pretty far off base.

#! /usr/bin/env python3

import json
from pprint import pprint

my_input = r"""{
    "hasMore": false,
    "checkins": [
        {
            "timestamp": 1353193745000,
            "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
            "privacy_groups": [
                "private"
            ],
            "meta": {
                "client_version": "3.0",
                "uuid": "fake_UUID"
            },
            "client_id": "fake_client_id",
            "workout_name": "Workout (Nov 17, 2012)",
            "fitness_workout_json": {
                "exercise_logs": [
                    {
                        "timestamp": 1353195716000,
                        "type": "exercise_log",
                        "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
                        "workout_log_uuid": "fake_UUID"
                    },
                    {
                        "timestamp": 1353195340000,
                        "type": "exercise_log",
                        "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
                        "workout_log_uuid": "fake_UUID"
                    }
                ]
            },
            "workout_uuid": ""
        },
        {
            "timestamp": 1354485615000,
            "user_id": "fake_ID",
            "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
            "privacy_groups": [
                "private"
            ],
            "meta": {
                "uuid": "fake_UUID"
            },
            "created": 1372023457376,
            "workout_name": "Workout (Dec 02, 2012)",
            "fitness_workout_json": {
                "exercise_logs": [
                    {
                        "timestamp": 1354485615000,
                        "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
                        "workout_log_uuid": "fake_UUID"
                    },
                    {
                        "timestamp": 1354485584000,
                        "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
                        "workout_log_uuid": "fake_UUID"
                    }
                ]
            },
            "workout_uuid": ""
        }]}"""

def recurse(obj):
    if isinstance(obj, list):
        for item in obj:
            return recurse(item)
    if isinstance(obj, dict):
        for k, v in obj.items():
            if isinstance(v, str):
                try:
                    v = json.loads(v)
                except ValueError:
                    pass
                obj.update({k: v})
            elif isinstance(v, (dict, list)):
                return recurse(v)

pprint(json.loads(my_input, object_hook=recurse))

有没有建议找到一种很好的方法来json.loads()所有这些双重编码的值而不改变对象的其余部分?提前非常感谢!

Any suggestions for a good way to json.loads() all those double-encoded values without changing the rest of the object? Many thanks in advance!

这篇文章似乎是一个很好的参考:修改深度嵌套结构

This post seems to be a good reference: Modifying Deeply-Nested Structures

此问题被标记为可能重复的此问题- -我认为它有很大的不同,因为我已经证明使用json.loads()无法正常工作.该解决方案最终需要一个object_hook,在解码json时我从不需要使用它,并且在前面的问题中也没有解决.

This was flagged as a possible duplicate of this question -- I think its fairly different, as I've already demonstrated that using json.loads() was not working. The solution ended up requiring an object_hook, which I've never had to use when decoding json and is not addressed in the prior question.

推荐答案

因此,每次json加载器构建完字典后,就会调用json加载器中的object_hook.也就是说,首先调用的是最内层字典,该字典向外工作.

So, the object_hook in the json loader is going to be called each time the json loader is finished constructing a dictionary. That is, the first thing it is called on is the inner-most dictionary, working outwards.

给出object_hook回调的字典被该函数返回的内容替换.

The dictionary that the object_hook callback is given is replaced by what that function returns.

因此,您无需递归自己.加载程序根据其性质使您能够首先访问最内在的东西.

So, you don't need to recurse yourself. The loader is giving you access to the inner-most things first by its nature.

我认为这对您有用:

def hook(obj):
    value = obj.get("timezone_id")
    # this is python 3 specific; I would check isinstance against 
    # basestring in python 2
    if value and isinstance(value, str):
        obj["timezone_id"] = json.loads(value, object_hook=hook)
    return obj
data = json.loads(my_input, object_hook=hook)

似乎有效果,我认为您在测试时正在寻找它.

It seems to have the effect I think you're looking for when I test it.

我可能不会尝试解码每个字符串值-从策略上讲,我会在您期望存在json对象双重编码的地方调用它.如果尝试解码每个字符串,则可能会意外地解码应该是字符串的内容(例如,当字符串"12345"旨在作为API返回的字符串时).

I probably wouldn't try to decode every string value -- I would strategically just call it where you expect there to be a json object double encoding to exist. If you try to decode every string, you might accidentally decode something that is supposed to be a string (like the string "12345" when that is intended to be a string returned by the API).

此外,如果您始终返回obj(无论是否更新其内容),那么您现有的功能将比其原来需要的复杂得多.

Also, your existing function is more complicated than it needs to be, might work as-is if you always returned obj (whether you update its contents or not).

这篇关于Python解码JSON中的嵌套JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆