递归生成器函数Python嵌套JSON数据 [英] Recursive Generator Function Python Nested JSON Data

查看:139
本文介绍了递归生成器函数Python嵌套JSON数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个递归生成器函数来拼合混合类型,列表和字典的嵌套json对象.我这样做的部分目的是为了自己学习,因此避免了从互联网上获取示例以确保我更好地了解正在发生的事情,但被卡住了,我认为yield语句在函数中的正确位置与循环.

I'm attempting to write a recursive generator function to flatten a nested json object of mixed types, lists and dictionaries. I am doing this partly for my own learning so have avoided grabbing an example from the internet to ensure I better understand what's happening, but have got stuck, with what I think is the correct placement of the yield statement in the function in relation to the loop.

传递到生成器函数的数据的源是外循环的输出,该外循环通过mongo集合进行迭代.

The source of the data passed to the generator function is the output of an outer loop which is iterating through a mongo collection.

当我在与Yield语句相同的位置使用print语句时,会得到预期的结果,但是当我将其切换到yield语句时,生成器似乎每次外循环迭代仅产生一项.

When I used a print statement in the same place as the Yield statement I get the results I am expecting but when I switch that to a yield statement the generator seems to only yield one item per iteration of the outer loop.

希望有人可以告诉我我要去哪里哪里

Hopefully someone can show me where I am going wrong.

columns = ['_id'
    , 'name'
    , 'personId'
    , 'status'
    , 'explorerProgress'
    , 'isSelectedForReview'
           ]
db = MongoClient().abcDatabase

coll = db.abcCollection


def dic_recurse(data, fields, counter, source_field):
    counter += 1
    if isinstance(data, dict):
        for k, v in data.items():
            if k in fields and isinstance(v, list) is False and isinstance(v, dict) is False:
                # print "{0}{1}".format(source_field, k)[1:], v
                yield "{0}{1}".format(source_field, k)[1:], v
            elif isinstance(v, list):
                source_field += "_{0}".format(k)
                [dic_recurse(l, fields, counter, source_field) for l in data.get(k)]
            elif isinstance(v, dict):
                source_field += "_{0}".format(k)
                dic_recurse(v, fields, counter, source_field)
    elif isinstance(data, list):
        [dic_recurse(l, fields, counter, '') for l in data]


for item in coll.find():
    for d in dic_recurse(item, columns, 0, ''):
        print d

下面是对其进行迭代的数据的示例,但是嵌套的确超出了显示的范围.

And below is a sample of the data it's iterating, but the nesting does increase beyond what's shown.

{ 
    "_id" : ObjectId("5478464ee4b0a44213e36eb0"), 
    "consultationId" : "54784388e4b0a44213e36d5f", 
    "modules" : [
        {
            "_id" : "FF", 
            "name" : "Foundations", 
            "strategyHeaders" : [
                {
                    "_id" : "FF_Money", 
                    "description" : "Let's see where you're spending your money.", 
                    "name" : "Managing money day to day", 
                    "statuses" : [
                        {
                            "pid" : "54784388e4b0a44213e36d5d", 
                            "status" : "selected", 
                            "whenUpdated" : NumberLong(1425017616062)
                        }, 
                        {
                            "pid" : "54783da8e4b09cf5d82d4e11", 
                            "status" : "selected", 
                            "whenUpdated" : NumberLong(1425017616062)
                        }
                    ], 
                    "strategies" : [
                        {
                            "_id" : "FF_Money_CF", 
                            "description" : "This option helps you get a picture of how much you're spending", 
                            "name" : "Your spending and savings.", 
                            "relatedGoals" : [
                                {
                                    "_id" : ObjectId("54784581e4b0a44213e36e2f")
                                }, 
                                {
                                    "_id" : ObjectId("5478458ee4b0a44213e36e33")
                                }, 
                                {
                                    "_id" : ObjectId("547845a5e4b0a44213e36e37")
                                }, 
                                {
                                    "_id" : ObjectId("54784577e4b0a44213e36e2b")
                                }, 
                                {
                                    "_id" : ObjectId("5478456ee4b0a44213e36e27")
                                }
                            ], 
                            "soaTrashWarning" : "Understanding what you are spending and saving is crucial to helping you achieve your goals. Without this in place, you may be spending more than you can afford. ", 
                            "statuses" : [
                                {
                                    "personId" : "54784388e4b0a44213e36d5d", 
                                    "status" : "selected", 
                                    "whenUpdated" : NumberLong(1425017616062)
                                }, 
                                {
                                    "personId" : "54783da8e4b09cf5d82d4e11", 
                                    "status" : "selected", 
                                    "whenUpdated" : NumberLong(1425017616062)
                                }
                            ], 
                            "trashWarning" : "This option helps you get a picture of how much you're spending and how much you could save.\nAre you sure you don't want to take up this option now?\n\n", 
                            "weight" : NumberInt(1)
                        }, 

更新 我不确定生成器功能是否确实进行了任何更改,但我对生成器功能进行了一些更改,并且我一直在调试程序中逐行浏览印刷版和良品版.新代码如下.

Update I've made a few changes to the generator function, although I'm not sure that they've really changed anything and I've been stepping through line by line in a debugger for both the print version and the yield version. The new code is below.

def dic_recurse(data, fields, counter, source_field):
    print 'Called'
    if isinstance(data, dict):
        for k, v in data.items():
            if isinstance(v, list):
                source_field += "_{0}".format(k)
                [dic_recurse(l, fields, counter, source_field) for l in v]
            elif isinstance(v, dict):
                source_field += "_{0}".format(k)
                dic_recurse(v, fields, counter, source_field)
            elif k in fields and isinstance(v, list) is False and isinstance(v, dict) is False:
                counter += 1
                yield "L{0}_{1}_{2}".format(counter, source_field, k.replace('_', ''))[1:], v
    elif isinstance(data, list):
        for l in data:
            dic_recurse(l, fields, counter, '')

在调试时,这两个版本之间的主要区别似乎是在命中了这段代码之后.

The key difference between the two versions when debugging seems to be that when this section of code is hit.

elif isinstance(data, list):
            for l in data:
                dic_recurse(l, fields, counter, '')

如果我正在测试yield版本,则对dic_recurse(l, fields, counter, '')行的调用会被命中,但是它似乎没有调用该函数,因为我在函数开头设置的任何打印语句都没有命中,但是如果我这样做了使用print进行相同操作,然后当代码触及同一部分时,它会愉快地调用该函数并在整个函数中运行.

If I am testing the yield version the call to dic_recurse(l, fields, counter, '') line get's hit but it doesn't seem to call the function because any print statements I set at the opening of the function aren't hit, but if I do the same using print then when the code hits the same section it happily calls the function and runs back through the whole function.

我确定我可能误解了有关生成器和yield语句使用的一些基本知识.

I'm sure I'm probably misunderstanding something fundamental about generators and the use of the yield statement.

推荐答案

代替对此的任何答复,我只是想发布我更新的解决方案,以防对其他人有用.

In lieu of any response on this I just wanted to post my updated solution in case it proves useful for anyone else.

我需要在函数中添加其他yield语句,以便可以递归生成器函数的每个递归调用的结果以供下一个使用,至少这是我所理解的方式.很高兴得到纠正.

I need to add additional yield statements to the function so the result of each recursive call of the generator function can be handed off to be used by the next, at least that's how I've understood it. Happy to be corrected.

def dic_recurse(data, fields, counter, source_field):
    if isinstance(data, dict):
        counter += 1
        for k, v in data.items():
            if isinstance(v, list):
                for field_data in v:
                    for list_field in dic_recurse(field_data, fields, counter, source_field):
                        yield list_field
            elif isinstance(v, dict):
                for dic_field in dic_recurse(v, fields, counter, source_field):
                    yield dic_field
            elif k in fields and isinstance(v, list) is False and isinstance(v, dict) is False:
                yield counter, {"{0}_L{1}".format(k, counter): v}
    elif isinstance(data, list):
        counter += 1
        for list_item in data:
            for li2 in dic_recurse(list_item, fields, counter, ''):
                yield li2

这篇关于递归生成器函数Python嵌套JSON数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆