展平双嵌套JSON [英] Flatten double nested JSON

查看:75
本文介绍了展平双嵌套JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试展平如下所示的JSON文件:

I am trying to flatten a JSON file that looks like this:

{
"teams": [
  {
    "teamname": "1",
    "members": [
      {
        "firstname": "John", 
        "lastname": "Doe",
        "orgname": "Anon",
        "phone": "916-555-1234",
        "mobile": "",
        "email": "john.doe@wildlife.net"
      },
      {
        "firstname": "Jane",
        "lastname": "Doe",
        "orgname": "Anon",
        "phone": "916-555-4321",
        "mobile": "916-555-7890",
        "email": "jane.doe@wildlife.net"
      }
    ]
  },
  {
    "teamname": "2",
    "members": [
      {
        "firstname": "Mickey",
        "lastname": "Moose",
        "orgname": "Moosers",
        "phone": "916-555-0000",
        "mobile": "916-555-1111",
        "email": "mickey.moose@wildlife.net"
      },
      {
        "firstname": "Minny",
        "lastname": "Moose",
        "orgname": "Moosers",
        "phone": "916-555-2222",
        "mobile": "",
        "email": "minny.moose@wildlife.net"
      }
    ]
  }       
]

}

我希望将其导出到excel表.我当前的代码是这样:

I wish to export this to an excel table. My current code is this:

from pandas.io.json import json_normalize
import json
import pandas as pd

inputFile = 'E:\\teams.json'
outputFile = 'E:\\teams.xlsx'

f = open(inputFile)
data = json.load(f)
f.close()

df = pd.DataFrame(data)

result1 = json_normalize(data, 'teams' )
print result1

结果如下:

members                                              teamname
0  [{u'firstname': u'John', u'phone': u'916-555-...        1
1  [{u'firstname': u'Mickey', u'phone': u'916-555-...      2

每行内嵌套2个成员的数据.我想要一个输出表,其中显示所有4个成员的数据及其关联的团队名称.

There are 2 members's data nested within each row. I would like to have an output table that displays all 4 members' data plus their associated teamname.

推荐答案

这是一种实现方法.应该给你一些想法.

This is one way to do it. Should give you some ideas.

df = pd.concat(
    [
        pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams']
    ], keys=[t['teamname'] for t in data['teams']]
)

                                     0                         1
1 email          john.doe@wildlife.net     jane.doe@wildlife.net
  firstname                       John                      Jane
  lastname                         Doe                       Doe
  mobile                                            916-555-7890
  orgname                         Anon                      Anon
  phone                   916-555-1234              916-555-4321
2 email      mickey.moose@wildlife.net  minny.moose@wildlife.net
  firstname                     Mickey                     Minny
  lastname                       Moose                     Moose
  mobile                  916-555-1111                          
  orgname                      Moosers                   Moosers
  phone                   916-555-0000              916-555-2222

要获得一个以团队名称和成员作为行的漂亮表,所有属性均在列中:

To get a nice table with team name and members as rows, all attributes in columns:

df.index.levels[0].name = 'teamname'
df.columns.name = 'member'

df.T.stack(0).swaplevel(0, 1).sort_index()

要将团队名称和成员作为实际列,只需重置索引即可.

To get team name and member as actual columns, just reset the index.

df.index.levels[0].name = 'teamname'
df.columns.name = 'member'

df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()

import json
import pandas as pd

json_text = """{
"teams": [
  {
    "teamname": "1",
    "members": [
      {
        "firstname": "John", 
        "lastname": "Doe",
        "orgname": "Anon",
        "phone": "916-555-1234",
        "mobile": "",
        "email": "john.doe@wildlife.net"
      },
      {
        "firstname": "Jane",
        "lastname": "Doe",
        "orgname": "Anon",
        "phone": "916-555-4321",
        "mobile": "916-555-7890",
        "email": "jane.doe@wildlife.net"
      }
    ]
  },
  {
    "teamname": "2",
    "members": [
      {
        "firstname": "Mickey",
        "lastname": "Moose",
        "orgname": "Moosers",
        "phone": "916-555-0000",
        "mobile": "916-555-1111",
        "email": "mickey.moose@wildlife.net"
      },
      {
        "firstname": "Minny",
        "lastname": "Moose",
        "orgname": "Moosers",
        "phone": "916-555-2222",
        "mobile": "",
        "email": "minny.moose@wildlife.net"
      }
    ]
  }       
]
}"""


data = json.loads(json_text)

df = pd.concat(
    [
        pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams']
    ], keys=[t['teamname'] for t in data['teams']]
)

df.index.levels[0].name = 'teamname'
df.columns.name = 'member'

df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()

这篇关于展平双嵌套JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆