如何使用平面数据表中的嵌套记录构建JSON文件? [英] How to build a JSON file with nested records from a flat data table?

查看:76
本文介绍了如何使用平面数据表中的嵌套记录构建JSON文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种Python技术,以从熊猫数据框中的平面表构建嵌套的JSON文件.例如,大熊猫数据框表如何:

I'm looking for a Python technique to build a nested JSON file from a flat table in a pandas data frame. For example how could a pandas data frame table such as:

teamname  member firstname lastname  orgname         phone        mobile
0        1       0      John      Doe     Anon  916-555-1234                 
1        1       1      Jane      Doe     Anon  916-555-4321  916-555-7890   
2        2       0    Mickey    Moose  Moosers  916-555-0000  916-555-1111   
3        2       1     Minny    Moose  Moosers  916-555-2222

被获取并导出为如下所示的JSON:

be taken and exported to a JSON that looks like:

{
"teams": [
{
"teamname": "1",
"members": [
  {
    "firstname": "John", 
    "lastname": "Doe",
    "orgname": "Anon",
    "phone": "916-555-1234",
    "mobile": "",
  },
  {
    "firstname": "Jane",
    "lastname": "Doe",
    "orgname": "Anon",
    "phone": "916-555-4321",
    "mobile": "916-555-7890",
  }
]
},
{
"teamname": "2",
"members": [
  {
    "firstname": "Mickey",
    "lastname": "Moose",
    "orgname": "Moosers",
    "phone": "916-555-0000",
    "mobile": "916-555-1111",
  },
  {
    "firstname": "Minny",
    "lastname": "Moose",
    "orgname": "Moosers",
    "phone": "916-555-2222",
    "mobile": "",
  }
]
}       
]

}

我尝试通过创建一个dict字典并将其转储到JSON来做到这一点.这是我当前的代码:

I have tried doing this by creating a dict of dicts and dumping to JSON. This is my current code:

data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')
memberDictTuple = [] 

for index, row in data.iterrows():
    dataRow = row
    rowDict = dict(zip(columnList[2:], dataRow[2:]))

    teamRowDict = {columnList[0]:int(dataRow[0])}

    memberId = tuple(row[1:2])
    memberId = memberId[0]

    teamName = tuple(row[0:1])
    teamName = teamName[0]

    memberDict1 = {int(memberId):rowDict}
    memberDict2 = {int(teamName):memberDict1}

    memberDictTuple.append(memberDict2)

memberDictTuple = tuple(memberDictTuple)
formattedJson = json.dumps(memberDictTuple, indent = 4, sort_keys = True)
print formattedJson

这将产生以下输出.每个项目都以正确的级别嵌套在团队名称" 1或2下,但是如果记录具有相同的团队名称,则应将它们嵌套在一起.如何解决此问题,使组名1和组名2各自嵌套2条记录?

This produces the following output. Each item is nested at the correct level under "teamname" 1 or 2, but records should be nested together if they have the same teamname. How can I fix this so that teamname 1 and teamname 2 each have 2 records nested within?

[
    {
        "1": {
            "0": {
                "email": "john.doe@wildlife.net", 
                "firstname": "John", 
                "lastname": "Doe", 
                "mobile": "none", 
                "orgname": "Anon", 
                "phone": "916-555-1234"
            }
        }
    }, 
    {
        "1": {
            "1": {
                "email": "jane.doe@wildlife.net", 
                "firstname": "Jane", 
                "lastname": "Doe", 
                "mobile": "916-555-7890", 
                "orgname": "Anon", 
                "phone": "916-555-4321"
            }
        }
    }, 
    {
        "2": {
            "0": {
                "email": "mickey.moose@wildlife.net", 
                "firstname": "Mickey", 
                "lastname": "Moose", 
                "mobile": "916-555-1111", 
                "orgname": "Moosers", 
                "phone": "916-555-0000"
            }
        }
    }, 
    {
        "2": {
            "1": {
                "email": "minny.moose@wildlife.net", 
                "firstname": "Minny", 
                "lastname": "Moose", 
                "mobile": "none", 
                "orgname": "Moosers", 
                "phone": "916-555-2222"
            }
        }
    }
]

推荐答案

这是可以工作并创建所需JSON格式的解决方案.首先,我将数据框按适当的列进行分组,然后为每个列标题/记录对创建字典(而不丢失数据顺序),而是将它们创建为元组列表,然后将列表转换为有序字典.为其他所有分组的两个列创建了另一个有序字典.为了使JSON转换产生正确的格式,列表和有序的dict之间必须进行精确的分层.另外请注意,转储为JSON时,必须将sort_keys设置为false,否则所有的Ordered Dicts都将重新排列为字母顺序.

This is the a solution that works and creates the desired JSON format. First, I grouped my dataframe by the appropriate columns, then instead of creating a dictionary (and losing data order) for each column heading/record pair, I created them as lists of tuples, then transformed the list into an Ordered Dict. Another Ordered Dict was created for the two columns that everything else was grouped by. Precise layering between lists and ordered dicts was necessary to for the JSON conversion to produce the correct format. Also note that when dumping to JSON, sort_keys must be set to false, or all your Ordered Dicts will be rearranged into alphabetical order.

import pandas
import json
from collections import OrderedDict

inputExcel = 'E:\\teams.xlsx'
exportJson = 'E:\\teams.json'

data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')

# This creates a tuple of column headings for later use matching them with column data
cols = []
columnList = list(data[0:])
for col in columnList:
    cols.append(str(col))
columnList = tuple(cols)

#This groups the dataframe by the 'teamname' and 'members' columns
grouped = data.groupby(['teamname', 'members']).first()

#This creates a reference to the index level of the groups
groupnames = data.groupby(["teamname", "members"]).grouper.levels
tm = (groupnames[0])

#Create a list to add team records to at the end of the first 'for' loop
teamsList = []

for teamN in tm:
    teamN = int(teamN)  #added this in to prevent TypeError: 1 is not JSON serializable
    tempList = []   #Create an temporary list to add each record to
    for index, row in grouped.iterrows():
        dataRow = row
        if index[0] == teamN:  #Select the record in each row of the grouped dataframe if its index matches the team number

            #In order to have the JSON records come out in the same order, I had to first create a list of tuples, then convert to and Ordered Dict
            rowDict = ([(columnList[2], dataRow[0]), (columnList[3], dataRow[1]), (columnList[4], dataRow[2]), (columnList[5], dataRow[3]), (columnList[6], dataRow[4]), (columnList[7], dataRow[5])])
            rowDict = OrderedDict(rowDict)
            tempList.append(rowDict)
    #Create another Ordered Dict to keep 'teamname' and the list of members from the temporary list sorted
    t = ([('teamname', str(teamN)), ('members', tempList)])
    t= OrderedDict(t)

    #Append the Ordered Dict to the emepty list of teams created earlier
    ListX = t
    teamsList.append(ListX)


#Create a final dictionary with a single item: the list of teams
teams = {"teams":teamsList} 

#Dump to JSON format
formattedJson = json.dumps(teams, indent = 1, sort_keys = False) #sort_keys MUST be set to False, or all dictionaries will be alphebetized
formattedJson = formattedJson.replace("NaN", '"NULL"') #"NaN" is the NULL format in pandas dataframes - must be replaced with "NULL" to be a valid JSON file
print formattedJson

#Export to JSON file
parsed = open(exportJson, "w")
parsed.write(formattedJson)

print"\n\nExport to JSON Complete"

这篇关于如何使用平面数据表中的嵌套记录构建JSON文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆