如何将具有变化列表(作为字典值)的嵌套json结构转换为数据框 [英] How to convert nested json structure having varying list (as dictionary values) to dataframe

查看:86
本文介绍了如何将具有变化列表(作为字典值)的嵌套json结构转换为数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将JSON转换为DataFrame,最后得到一列"Structure_value",该列具有以下值作为字典/词典列表:

I converted a JSON into DataFrame and ended up with a column 'Structure_value' having below values as a list of dictionary/dictionaries:

                   Structure_value
[{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}]
[{'Room': [6], 'Length': 22}]
[{'Room': [6,6], 'Length': 8}]

我需要将其分为以下四列:

I need to split it into below four columns:

Structure_value_room_1 Structure_value_length_1 Structure_value_room_2 Structure_value_length_2

Structure_value_room_1 Structure_value_length_1 Structure_value_room_2 Structure_value_length_2

其输出应如下:

   Structure_value_room_1  Structure_value_length_1  Structure_value_room_2  \
0                       6                         7                     6.0   
1                       6                        22                     NaN   
2                       6                         8                     6.0   

   Structure_value_length_2  
0                       7.0  
1                       NaN  
2                       8.0  

如何处理单个属性在单个列表中具有多个值的情况,我们需要将它们拆分为其他列.

How to handle such cases where a single attribute has multiple values in a single list and we need to split them into other columns.

附言:我可以处理以下类型的数据如下情况:[{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}],但我无法处理这种情况[{'Room': [6,6], 'Length': 8}].

P.S.: I am able to handle these type of cases where data is like this : [{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}] but I am unable to handle this case [{'Room': [6,6], 'Length': 8}].

推荐答案

我无法将您的Structure_value表示形式作为json文件处理,我不知道它们是否代表许多单个文件. 我使用了[{'Room':[6],'Length':7},{'Room':[6],'Length':7}]作为file1和[{'Room':[6],'Length ':22}]作为文件2,[{'Room':[6,6],'Length':8}]作为文件3.

I could not handle your Structure_value presentation as a json file, I don't know if they represent many single files. I used [{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}] as file1 and [{'Room': [6], 'Length': 22}] as file2 and [{'Room': [6,6], 'Length': 8}] as file3.

#treat the irregular structures
def process_structure(s):

    specs = []

    for label,quantity in s.items():

        if isinstance(quantity,list):       
            specs.append(label)
            for elem in quantity:
                specs.append(elem)          
        elif isinstance(quantity,int):
            specs.append(label)
            specs.append(quantity)

    return specs

#open and treat jsons
def treat_json(file):

    with open(file, 'r') as f:

        dicts   = {}
        to_df   = []
        load_df = []

        valRoom = 0
        valLen  = 0

        structures = json.load(f)

        for dicts in structures:

            to_df = process_structure(dicts)
            long  = len(to_df) 

            for i in range(0,long):

                if to_df[i] == 'Room':
                    valRoom = to_df[i+1]
                    load_df.append(valRoom)
                elif to_df[i] == 'Length':
                    valLen = to_df[i+1]
                    load_df.append(valLen)
                elif isinstance(to_df[i],int) and i < (long - 1):
                    if isinstance(to_df[i+1],int):
                        load_df.append(to_df[i+1])
                        load_df.append(valLen)#repeat Length

        while len(load_df) < 4: #if its no complete
            load_df.append(None)

        df_temp = pd.DataFrame([load_df],columns=['Structure_value_room_1','Structure_value_length_1','Structure_value_room_2','Structure_value_length_2'])

    return df_temp

那是照片:

treat_json('house3.json')
    Structure_value_room_1  ...  Structure_value_length_2
0                       6  ...                         8

[1 rows x 4 columns]

treat_json('house2.json')
    Structure_value_room_1  ...  Structure_value_length_2
0                       6  ...                      None

[1 rows x 4 columns]

treat_json('house1.json')

    Structure_value_room_1  ...  Structure_value_length_2
0                       6  ...                         7

[1 rows x 4 columns]

这篇关于如何将具有变化列表(作为字典值)的嵌套json结构转换为数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆