如何使用平面数据表中的嵌套记录构建 JSON 文件? [英] How to build a JSON file with nested records from a flat data table?

查看:22
本文介绍了如何使用平面数据表中的嵌套记录构建 JSON 文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种 Python 技术,可以从 Pandas 数据框中的平面表构建嵌套的 JSON 文件.例如,如何使用熊猫数据框表,例如:

teamname member firstname lastname orgname phone mobile0 1 0 约翰·多伊匿名 916-555-12341 1 1 Jane Doe 匿名 916-555-4321 916-555-78902 2 0 米奇驼鹿 916-555-0000 916-555-11113 2 1 Minny Moose Moosers 916-555-2222

被提取并导出到一个 JSON 格式,如下所示:

<代码>{团队":[{"团队名称": "1",成员":[{"firstname": "约翰","lastname": "母鹿","orgname": "匿名","电话": "916-555-1234",移动的": "",},{"firstname": "简","lastname": "母鹿","orgname": "匿名","电话": "916-555-4321","手机": "916-555-7890",}]},{"团队名称": "2",成员":[{"firstname": "米奇","lastname": "驼鹿","orgname": "Moosers","电话": "916-555-0000","手机": "916-555-1111",},{"firstname": "Minny","lastname": "驼鹿","orgname": "Moosers","电话": "916-555-2222",移动的": "",}]}]}

我尝试通过创建一个 dict 的 dict 并转储到 JSON 来做到这一点.这是我当前的代码:

data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')memberDictTuple = []对于索引,data.iterrows() 中的行:数据行 = 行rowDict = dict(zip(columnList[2:], dataRow[2:]))teamRowDict = {columnList[0]:int(dataRow[0])}成员 ID = 元组(行 [1:2])成员 ID = 成员 ID[0]团队名称 = 元组(行 [0:1])团队名称 = 团队名称[0]memberDict1 = {int(memberId):rowDict}memberDict2 = {int(teamName):memberDict1}memberDictTuple.append(memberDict2)memberDictTuple = 元组(memberDictTuple)formattedJson = json.dumps(memberDictTuple, indent = 4, sort_keys = True)打印格式化Json

这会产生以下输出.每个项目都嵌套在团队名称"1 或 2 下的正确级别,但如果记录具有相同的团队名称,则应嵌套在一起.我该如何解决这个问题,以便团队名称 1 和团队名称 2 各有 2 个嵌套的记录?

<预><代码>[{1":{0":{"email": "john.doe@wildlife.net","firstname": "约翰","lastname": "母鹿","移动": "无","orgname": "匿名",电话":916-555-1234"}}},{1":{1":{"email": "jane.doe@wildlife.net","firstname": "简","lastname": "母鹿","手机": "916-555-7890","orgname": "匿名",电话":916-555-4321"}}},{2":{0":{"email": "mickey.moose@wildlife.net","firstname": "米奇","lastname": "驼鹿","手机": "916-555-1111","orgname": "Moosers",电话":916-555-0000"}}},{2":{1":{"email": "minny.moose@wildlife.net","firstname": "Minny","lastname": "驼鹿","移动": "无","orgname": "Moosers",电话":916-555-2222"}}}]

解决方案

这是一个有效的解决方案,可以创建所需的 JSON 格式.首先,我按适当的列对数据框进行分组,然后我没有为每个列标题/记录对创建字典(并丢失数据顺序),而是将它们创建为元组列表,然后将列表转换为有序字典.为其他所有内容分组的两列创建了另一个 Ordered Dict.列表和有序字典之间的精确分层对于 JSON 转换以产生正确的格式是必要的.另请注意,在转储为 JSON 时,sort_keys 必须设置为 false,否则您的所有 Ordered Dicts 将按字母顺序重新排列.

导入熊猫导入json从集合导入 OrderedDictinputExcel = 'E:\teams.xlsx'exportJson = 'E:\teams.json'data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')# 这将创建一个列标题元组,供以后使用,将它们与列数据匹配列 = []columnList = 列表(数据[0:])对于 columnList 中的 col:cols.append(str(col))columnList = 元组(列)#这按teamname"和members"列对数据框进行分组grouped = data.groupby(['teamname', 'members']).first()#这将创建对组索引级别的引用groupnames = data.groupby(["teamname", "members"]).grouper.levelstm = (组名[0])#创建一个列表,将团队记录添加到第一个for"循环的末尾团队列表 = []对于 tm 中的 teamN:teamN = int(teamN) #加入这个是为了防止TypeError: 1 is not JSON serializabletempList = [] #创建一个临时列表,将每条记录添加到对于索引,grouped.iterrows() 中的行:数据行 = 行if index[0] == teamN: #如果索引与团队编号匹配,则选择分组数据帧的每一行中的记录#为了让 JSON 记录以相同的顺序出现,我必须首先创建一个元组列表,然后转换为 Ordered DictrowDict = ([(columnList[2], dataRow[0]), (columnList[3], dataRow[1]), (columnList[4], dataRow[2]), (columnList[5], dataRow[3]), (columnList[6], dataRow[4]), (columnList[7], dataRow[5])])rowDict = OrderedDict(rowDict)tempList.append(rowDict)#创建另一个有序字典以保持团队名称"和临时列表中的成员列表排序t = ([('teamname', str(teamN)), ('members', tempList)])t = OrderedDict(t)#将 Ordered Dict 附加到之前创建的团队的空列表中列表X = tteamList.append(ListX)#创建一个包含单个项目的最终字典:团队列表团队 = {团队":团队列表}#转储为JSON格式formattedJson = json.dumps(teams, indent = 1, sort_keys = False) #sort_keys 必须设置为 False,否则所有字典都将被字母化formattedJson = formattedJson.replace("NaN", '"NULL"') #"NaN" 是 Pandas 数据帧中的 NULL 格式 - 必须替换为 "NULL" 才能成为有效的 JSON 文件打印格式化Json#导出到JSON文件解析 = 打开(exportJson,w")parsed.write(formattedJson)打印"

导出到 JSON 完成"

I'm looking for a Python technique to build a nested JSON file from a flat table in a pandas data frame. For example how could a pandas data frame table such as:

teamname  member firstname lastname  orgname         phone        mobile
0        1       0      John      Doe     Anon  916-555-1234                 
1        1       1      Jane      Doe     Anon  916-555-4321  916-555-7890   
2        2       0    Mickey    Moose  Moosers  916-555-0000  916-555-1111   
3        2       1     Minny    Moose  Moosers  916-555-2222

be taken and exported to a JSON that looks like:

{
"teams": [
{
"teamname": "1",
"members": [
  {
    "firstname": "John", 
    "lastname": "Doe",
    "orgname": "Anon",
    "phone": "916-555-1234",
    "mobile": "",
  },
  {
    "firstname": "Jane",
    "lastname": "Doe",
    "orgname": "Anon",
    "phone": "916-555-4321",
    "mobile": "916-555-7890",
  }
]
},
{
"teamname": "2",
"members": [
  {
    "firstname": "Mickey",
    "lastname": "Moose",
    "orgname": "Moosers",
    "phone": "916-555-0000",
    "mobile": "916-555-1111",
  },
  {
    "firstname": "Minny",
    "lastname": "Moose",
    "orgname": "Moosers",
    "phone": "916-555-2222",
    "mobile": "",
  }
]
}       
]

}

I have tried doing this by creating a dict of dicts and dumping to JSON. This is my current code:

data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')
memberDictTuple = [] 

for index, row in data.iterrows():
    dataRow = row
    rowDict = dict(zip(columnList[2:], dataRow[2:]))

    teamRowDict = {columnList[0]:int(dataRow[0])}

    memberId = tuple(row[1:2])
    memberId = memberId[0]

    teamName = tuple(row[0:1])
    teamName = teamName[0]

    memberDict1 = {int(memberId):rowDict}
    memberDict2 = {int(teamName):memberDict1}

    memberDictTuple.append(memberDict2)

memberDictTuple = tuple(memberDictTuple)
formattedJson = json.dumps(memberDictTuple, indent = 4, sort_keys = True)
print formattedJson

This produces the following output. Each item is nested at the correct level under "teamname" 1 or 2, but records should be nested together if they have the same teamname. How can I fix this so that teamname 1 and teamname 2 each have 2 records nested within?

[
    {
        "1": {
            "0": {
                "email": "john.doe@wildlife.net", 
                "firstname": "John", 
                "lastname": "Doe", 
                "mobile": "none", 
                "orgname": "Anon", 
                "phone": "916-555-1234"
            }
        }
    }, 
    {
        "1": {
            "1": {
                "email": "jane.doe@wildlife.net", 
                "firstname": "Jane", 
                "lastname": "Doe", 
                "mobile": "916-555-7890", 
                "orgname": "Anon", 
                "phone": "916-555-4321"
            }
        }
    }, 
    {
        "2": {
            "0": {
                "email": "mickey.moose@wildlife.net", 
                "firstname": "Mickey", 
                "lastname": "Moose", 
                "mobile": "916-555-1111", 
                "orgname": "Moosers", 
                "phone": "916-555-0000"
            }
        }
    }, 
    {
        "2": {
            "1": {
                "email": "minny.moose@wildlife.net", 
                "firstname": "Minny", 
                "lastname": "Moose", 
                "mobile": "none", 
                "orgname": "Moosers", 
                "phone": "916-555-2222"
            }
        }
    }
]

解决方案

This is the a solution that works and creates the desired JSON format. First, I grouped my dataframe by the appropriate columns, then instead of creating a dictionary (and losing data order) for each column heading/record pair, I created them as lists of tuples, then transformed the list into an Ordered Dict. Another Ordered Dict was created for the two columns that everything else was grouped by. Precise layering between lists and ordered dicts was necessary to for the JSON conversion to produce the correct format. Also note that when dumping to JSON, sort_keys must be set to false, or all your Ordered Dicts will be rearranged into alphabetical order.

import pandas
import json
from collections import OrderedDict

inputExcel = 'E:\teams.xlsx'
exportJson = 'E:\teams.json'

data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')

# This creates a tuple of column headings for later use matching them with column data
cols = []
columnList = list(data[0:])
for col in columnList:
    cols.append(str(col))
columnList = tuple(cols)

#This groups the dataframe by the 'teamname' and 'members' columns
grouped = data.groupby(['teamname', 'members']).first()

#This creates a reference to the index level of the groups
groupnames = data.groupby(["teamname", "members"]).grouper.levels
tm = (groupnames[0])

#Create a list to add team records to at the end of the first 'for' loop
teamsList = []

for teamN in tm:
    teamN = int(teamN)  #added this in to prevent TypeError: 1 is not JSON serializable
    tempList = []   #Create an temporary list to add each record to
    for index, row in grouped.iterrows():
        dataRow = row
        if index[0] == teamN:  #Select the record in each row of the grouped dataframe if its index matches the team number

            #In order to have the JSON records come out in the same order, I had to first create a list of tuples, then convert to and Ordered Dict
            rowDict = ([(columnList[2], dataRow[0]), (columnList[3], dataRow[1]), (columnList[4], dataRow[2]), (columnList[5], dataRow[3]), (columnList[6], dataRow[4]), (columnList[7], dataRow[5])])
            rowDict = OrderedDict(rowDict)
            tempList.append(rowDict)
    #Create another Ordered Dict to keep 'teamname' and the list of members from the temporary list sorted
    t = ([('teamname', str(teamN)), ('members', tempList)])
    t= OrderedDict(t)

    #Append the Ordered Dict to the emepty list of teams created earlier
    ListX = t
    teamsList.append(ListX)


#Create a final dictionary with a single item: the list of teams
teams = {"teams":teamsList} 

#Dump to JSON format
formattedJson = json.dumps(teams, indent = 1, sort_keys = False) #sort_keys MUST be set to False, or all dictionaries will be alphebetized
formattedJson = formattedJson.replace("NaN", '"NULL"') #"NaN" is the NULL format in pandas dataframes - must be replaced with "NULL" to be a valid JSON file
print formattedJson

#Export to JSON file
parsed = open(exportJson, "w")
parsed.write(formattedJson)

print"

Export to JSON Complete"

这篇关于如何使用平面数据表中的嵌套记录构建 JSON 文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆