将 Pandas Dataframe 转换为嵌套的 JSON [英] Convert Pandas Dataframe to nested JSON

查看:48
本文介绍了将 Pandas Dataframe 转换为嵌套的 JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Python 和 Pandas 的新手.我正在尝试将 Pandas Dataframe 转换为嵌套的 JSON..to_json() 函数没有为我的目标提供足够的灵活性.

以下是数据框的一些数据点(以 csv 格式,逗号分隔):

,ID,Location,Country,Latitude,Longitude,timestamp,tide0,1,BREST,FRA,48.383,-4.495,1807-01-01,6905.01,1,BREST,FRA,48.383,-4.495,1807-02-01,6931.02,1,BREST,FRA,48.383,-4.495,1807-03-01,6896.03,1,BREST,FRA,48.383,-4.495,1807-04-01,6953.04,1,BREST,FRA,48.383,-4.495,1807-05-01,7043.02508,7,CUXHAVEN 2,DEU,53.867,8.717,1843-01-01,7093.02509,7,CUXHAVEN 2,DEU,53.867,8.717,1843-02-01,6688.02510,7,CUXHAVEN 2,DEU,53.867,8.717,1843-03-01,6493.02511,7,CUXHAVEN 2,DEU,53.867,8.717,1843-04-01,6723.02512,7,CUXHAVEN 2,DEU,53.867,8.717,1843-05-01,6533.04525,9,MAASSLUIS,NLD,51.918,4.25,1848-02-01,6880.04526,9,MAASSLUIS,NLD,51.918,4.25,1848-03-01,6700.04527,9,MAASSLUIS,NLD,51.918,4.25,1848-04-01,6775.04528,9,MAASSLUIS,NLD,51.918,4.25,1848-05-01,6580.04529,9,MAASSLUIS,NLD,51.918,4.25,1848-06-01,6685.06540,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-07-01,6957.06541,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-08-01,6944.06542,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-09-01,7084.06543,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-10-01,6898.06544,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-11-01,6859.08538,10,旧金山,美国,37.806999999999995,-122.465,1854-07-01,6909.08539,10,旧金山,美国,37.806999999999995,-122.465,1854-08-01,6940.08540,10,旧金山,美国,37.806999999999995,-122.465,1854-09-01,6961.08541,10,旧金山,美国,37.806999999999995,-122.465,1854-10-01,6952.08542,10,旧金山,美国,37.806999999999995,-122.465,1854-11-01,6952.0

有很多重复的信息,我想要一个这样的 JSON:

<预><代码>[{身份证":1,"位置": "布雷斯特",纬度":48.383,经度":-4.495,"国家": "FRA",潮汐数据":{1807-02-01":6931,1807-03-01":6896,1807-04-01":6953,1807-05-01":7043}},{身份证":5,"Location": "HOLYHEAD",纬度":53.31399999999999,经度":-4.62,"国家": "GBR",潮汐数据":{1807-02-01":6931,1807-03-01":6896,1807-04-01":6953,1807-05-01":7043}}]

我怎样才能做到这一点?

重现数据帧的代码:

# 输入jsonjson_str = '[{"ID":1,"Location":"BREST","Country":"FRA","Latitude":48.383,"Longitude":-4.495,"timestamp":"1807-01-01","tide":6905},{"ID":1,"Location":"BREST","Country":"FRA","Latitude":48.383,"Longitude":-4.495,"timestamp":"1807-02-01","tide":6931},{"ID":1,"Location":"BREST","Country":"DEU","Latitude":48.383,"Longitude":-4.495,"timestamp":"1807-03-01","tide":6896},{"ID":7,"Location":"CUXHAVEN 2","Country":"DEU","Latitude":53.867,"经度":-8.717,"timestamp":"1843-01-01","tide":7093},{"ID":7,"Location":"CUXHAVEN 2","Country":"DEU","Latitude":53.867,"Longitude":-8.717,"timestamp":"1843-02-01","tide":6688},{"ID":7,"Location":"CUXHAVEN 2","Country":"DEU","Latitude":53.867,"Longitude":-8.717,"timestamp":"1843-03-01","tide":6493}]'# 加载json对象数据列表 = json.loads(json_str)# 创建数据框df = json_normalize(数据列表,无,无)

解决方案

更新:

j = (df.groupby(['ID','Location','Country','Latitude','Longitude']).apply(lambda x: x[['timestamp','tide']].to_dict('records')).reset_index().rename(columns={0:'Tide-Data'}).to_json(orient='记录'))

结果(格式化):

In [103]: print(json.dumps(json.loads(j), indent=2, sort_keys=True))[{国家":FRA",ID":1,纬度":48.383,位置":布雷斯特",经度":-4.495,潮汐数据":[{潮":6905.0,时间戳":1807-01-01"},{潮":6931.0,时间戳":1807-02-01"},{潮":6896.0,时间戳":1807-03-01"},{潮":6953.0,时间戳":1807-04-01"},{潮":7043.0,时间戳":1807-05-01"}]},{国家":DEU",ID":7,纬度":53.867,位置":CUXHAVEN 2",经度":8.717,潮汐数据":[{潮":7093.0,时间戳":1843-01-01"},{潮":6688.0,时间戳":1843-02-01"},{潮":6493.0,时间戳":1843-03-01"},{潮":6723.0,时间戳":1843-04-01"},{潮":6533.0,时间戳":1843-05-01"}]},{国家":DEU",ID":8,纬度":53.899,位置":WISMAR 2",经度":11.458,潮汐数据":[{潮":6957.0,时间戳":1848-07-01"},{潮":6944.0,时间戳":1848-08-01"},{潮":7084.0,时间戳":1848-09-01"},{潮":6898.0,时间戳":1848-10-01"},{潮":6859.0,时间戳":1848-11-01"}]},{国家":全国民主联盟",ID":9,纬度":51.918,位置":MAASSLUIS",经度":4.25,潮汐数据":[{潮":6880.0,时间戳":1848-02-01"},{潮":6700.0,时间戳":1848-03-01"},{潮":6775.0,时间戳":1848-04-01"},{潮":6580.0,时间戳":1848-05-01"},{潮":6685.0,时间戳":1848-06-01"}]},{国家":美国",ID":10,纬度":37.807,地点":旧金山",经度":-122.465,潮汐数据":[{潮":6909.0,时间戳":1854-07-01"},{潮":6940.0,时间戳":1854-08-01"},{潮":6961.0,时间戳":1854-09-01"},{潮":6952.0,时间戳":1854-10-01"},{潮":6952.0,时间戳":1854-11-01"}]}]

旧答案:

您可以使用 groupby()apply()to_json() 方法来实现:

j = (df.groupby(['ID','Location','Country','Latitude','Longitude'], as_index=False).apply(lambda x: dict(zip(x.timestamp,x.tide))).reset_index().rename(columns={0:'Tide-Data'}).to_json(orient='记录'))

输出:

在[112]中:print(json.dumps(json.loads(j), indent=2, sort_keys=True))[{国家":FRA",ID":1,纬度":48.383,位置":布雷斯特",经度":-4.495,潮汐数据":{1807-01-01":6905.0,1807-02-01":6931.0,1807-03-01":6896.0,1807-04-01":6953.0,1807-05-01":7043.0}},{国家":DEU",ID":7,纬度":53.867,位置":CUXHAVEN 2",经度":8.717,潮汐数据":{1843-01-01":7093.0,1843-02-01":6688.0,1843-03-01":6493.0,1843-04-01":6723.0,1843-05-01":6533.0}},{国家":DEU",ID":8,纬度":53.899,位置":WISMAR 2",经度":11.458,潮汐数据":{1848-07-01":6957.0,1848-08-01":6944.0,1848-09-01":7084.0,1848-10-01":6898.0,1848-11-01":6859.0}},{国家":全国民主联盟",ID":9,纬度":51.918,位置":MAASSLUIS",经度":4.25,潮汐数据":{1848-02-01":6880.0,1848-03-01":6700.0,1848-04-01":6775.0,1848-05-01":6580.0,1848-06-01":6685.0}},{国家":美国",ID":10,纬度":37.807,地点":旧金山",经度":-122.465,潮汐数据":{1854-07-01":6909.0,1854-08-01":6940.0,1854-09-01":6961.0,1854-10-01":6952.0,1854-11-01":6952.0}}]

PS 如果你不关心身份,你可以直接写入 JSON 文件:

(df.groupby(['ID','Location','Country','Latitude','Longitude'], as_index=False).apply(lambda x: dict(zip(x.timestamp,x.tide))).reset_index().rename(columns={0:'Tide-Data'}).to_json('/path/to/file_name.json', orient='records'))

I am new to Python and Pandas. I am trying to convert a Pandas Dataframe to a nested JSON. The function .to_json() doens't give me enough flexibility for my aim.

Here are some data points of the dataframe (in csv, comma separated):

,ID,Location,Country,Latitude,Longitude,timestamp,tide  
0,1,BREST,FRA,48.383,-4.495,1807-01-01,6905.0  
1,1,BREST,FRA,48.383,-4.495,1807-02-01,6931.0  
2,1,BREST,FRA,48.383,-4.495,1807-03-01,6896.0  
3,1,BREST,FRA,48.383,-4.495,1807-04-01,6953.0  
4,1,BREST,FRA,48.383,-4.495,1807-05-01,7043.0  
2508,7,CUXHAVEN 2,DEU,53.867,8.717,1843-01-01,7093.0  
2509,7,CUXHAVEN 2,DEU,53.867,8.717,1843-02-01,6688.0  
2510,7,CUXHAVEN 2,DEU,53.867,8.717,1843-03-01,6493.0  
2511,7,CUXHAVEN 2,DEU,53.867,8.717,1843-04-01,6723.0  
2512,7,CUXHAVEN 2,DEU,53.867,8.717,1843-05-01,6533.0  
4525,9,MAASSLUIS,NLD,51.918,4.25,1848-02-01,6880.0  
4526,9,MAASSLUIS,NLD,51.918,4.25,1848-03-01,6700.0  
4527,9,MAASSLUIS,NLD,51.918,4.25,1848-04-01,6775.0  
4528,9,MAASSLUIS,NLD,51.918,4.25,1848-05-01,6580.0  
4529,9,MAASSLUIS,NLD,51.918,4.25,1848-06-01,6685.0  
6540,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-07-01,6957.0  
6541,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-08-01,6944.0  
6542,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-09-01,7084.0  
6543,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-10-01,6898.0  
6544,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-11-01,6859.0  
8538,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-07-01,6909.0  
8539,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-08-01,6940.0  
8540,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-09-01,6961.0  
8541,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-10-01,6952.0  
8542,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-11-01,6952.0  

There is a lot of repetitive information and I would like to have a JSON like this:

[
{
    "ID": 1,
    "Location": "BREST",
    "Latitude": 48.383,
    "Longitude": -4.495,
    "Country": "FRA",
    "Tide-Data": {
        "1807-02-01": 6931,
        "1807-03-01": 6896,
        "1807-04-01": 6953,
        "1807-05-01": 7043
    }
},
{
    "ID": 5,
    "Location": "HOLYHEAD",
    "Latitude": 53.31399999999999,
    "Longitude": -4.62,
    "Country": "GBR",
    "Tide-Data": {
        "1807-02-01": 6931,
        "1807-03-01": 6896,
        "1807-04-01": 6953,
        "1807-05-01": 7043
    }
}
]

How could I achieve this?

EDIT:

Code to reproduce the dataframe:

# input json
json_str = '[{"ID":1,"Location":"BREST","Country":"FRA","Latitude":48.383,"Longitude":-4.495,"timestamp":"1807-01-01","tide":6905},{"ID":1,"Location":"BREST","Country":"FRA","Latitude":48.383,"Longitude":-4.495,"timestamp":"1807-02-01","tide":6931},{"ID":1,"Location":"BREST","Country":"DEU","Latitude":48.383,"Longitude":-4.495,"timestamp":"1807-03-01","tide":6896},{"ID":7,"Location":"CUXHAVEN 2","Country":"DEU","Latitude":53.867,"Longitude":-8.717,"timestamp":"1843-01-01","tide":7093},{"ID":7,"Location":"CUXHAVEN 2","Country":"DEU","Latitude":53.867,"Longitude":-8.717,"timestamp":"1843-02-01","tide":6688},{"ID":7,"Location":"CUXHAVEN 2","Country":"DEU","Latitude":53.867,"Longitude":-8.717,"timestamp":"1843-03-01","tide":6493}]'

# load json object
data_list = json.loads(json_str)

# create dataframe
df = json_normalize(data_list, None, None)

解决方案

UPDATE:

j = (df.groupby(['ID','Location','Country','Latitude','Longitude'])
       .apply(lambda x: x[['timestamp','tide']].to_dict('records'))
       .reset_index()
       .rename(columns={0:'Tide-Data'})
       .to_json(orient='records'))
     

Result (formatted):

In [103]: print(json.dumps(json.loads(j), indent=2, sort_keys=True))
[
  {
    "Country": "FRA",
    "ID": 1,
    "Latitude": 48.383,
    "Location": "BREST",
    "Longitude": -4.495,
    "Tide-Data": [
      {
        "tide": 6905.0,
        "timestamp": "1807-01-01"
      },
      {
        "tide": 6931.0,
        "timestamp": "1807-02-01"
      },
      {
        "tide": 6896.0,
        "timestamp": "1807-03-01"
      },
      {
        "tide": 6953.0,
        "timestamp": "1807-04-01"
      },
      {
        "tide": 7043.0,
        "timestamp": "1807-05-01"
      }
    ]
  },
  {
    "Country": "DEU",
    "ID": 7,
    "Latitude": 53.867,
    "Location": "CUXHAVEN 2",
    "Longitude": 8.717,
    "Tide-Data": [
      {
        "tide": 7093.0,
        "timestamp": "1843-01-01"
      },
      {
        "tide": 6688.0,
        "timestamp": "1843-02-01"
      },
      {
        "tide": 6493.0,
        "timestamp": "1843-03-01"
      },
      {
        "tide": 6723.0,
        "timestamp": "1843-04-01"
      },
      {
        "tide": 6533.0,
        "timestamp": "1843-05-01"
      }
    ]
  },
  {
    "Country": "DEU",
    "ID": 8,
    "Latitude": 53.899,
    "Location": "WISMAR 2",
    "Longitude": 11.458,
    "Tide-Data": [
      {
        "tide": 6957.0,
        "timestamp": "1848-07-01"
      },
      {
        "tide": 6944.0,
        "timestamp": "1848-08-01"
      },
      {
        "tide": 7084.0,
        "timestamp": "1848-09-01"
      },
      {
        "tide": 6898.0,
        "timestamp": "1848-10-01"
      },
      {
        "tide": 6859.0,
        "timestamp": "1848-11-01"
      }
    ]
  },
  {
    "Country": "NLD",
    "ID": 9,
    "Latitude": 51.918,
    "Location": "MAASSLUIS",
    "Longitude": 4.25,
    "Tide-Data": [
      {
        "tide": 6880.0,
        "timestamp": "1848-02-01"
      },
      {
        "tide": 6700.0,
        "timestamp": "1848-03-01"
      },
      {
        "tide": 6775.0,
        "timestamp": "1848-04-01"
      },
      {
        "tide": 6580.0,
        "timestamp": "1848-05-01"
      },
      {
        "tide": 6685.0,
        "timestamp": "1848-06-01"
      }
    ]
  },
  {
    "Country": "USA",
    "ID": 10,
    "Latitude": 37.807,
    "Location": "SAN FRANCISCO",
    "Longitude": -122.465,
    "Tide-Data": [
      {
        "tide": 6909.0,
        "timestamp": "1854-07-01"
      },
      {
        "tide": 6940.0,
        "timestamp": "1854-08-01"
      },
      {
        "tide": 6961.0,
        "timestamp": "1854-09-01"
      },
      {
        "tide": 6952.0,
        "timestamp": "1854-10-01"
      },
      {
        "tide": 6952.0,
        "timestamp": "1854-11-01"
      }
    ]
  }
]

OLD answer:

You can do it using groupby(), apply() and to_json() methods:

j = (df.groupby(['ID','Location','Country','Latitude','Longitude'], as_index=False)
       .apply(lambda x: dict(zip(x.timestamp,x.tide)))
       .reset_index()
       .rename(columns={0:'Tide-Data'})
       .to_json(orient='records'))

Output:

In [112]: print(json.dumps(json.loads(j), indent=2, sort_keys=True))
[
  {
    "Country": "FRA",
    "ID": 1,
    "Latitude": 48.383,
    "Location": "BREST",
    "Longitude": -4.495,
    "Tide-Data": {
      "1807-01-01": 6905.0,
      "1807-02-01": 6931.0,
      "1807-03-01": 6896.0,
      "1807-04-01": 6953.0,
      "1807-05-01": 7043.0
    }
  },
  {
    "Country": "DEU",
    "ID": 7,
    "Latitude": 53.867,
    "Location": "CUXHAVEN 2",
    "Longitude": 8.717,
    "Tide-Data": {
      "1843-01-01": 7093.0,
      "1843-02-01": 6688.0,
      "1843-03-01": 6493.0,
      "1843-04-01": 6723.0,
      "1843-05-01": 6533.0
    }
  },
  {
    "Country": "DEU",
    "ID": 8,
    "Latitude": 53.899,
    "Location": "WISMAR 2",
    "Longitude": 11.458,
    "Tide-Data": {
      "1848-07-01": 6957.0,
      "1848-08-01": 6944.0,
      "1848-09-01": 7084.0,
      "1848-10-01": 6898.0,
      "1848-11-01": 6859.0
    }
  },
  {
    "Country": "NLD",
    "ID": 9,
    "Latitude": 51.918,
    "Location": "MAASSLUIS",
    "Longitude": 4.25,
    "Tide-Data": {
      "1848-02-01": 6880.0,
      "1848-03-01": 6700.0,
      "1848-04-01": 6775.0,
      "1848-05-01": 6580.0,
      "1848-06-01": 6685.0
    }
  },
  {
    "Country": "USA",
    "ID": 10,
    "Latitude": 37.807,
    "Location": "SAN FRANCISCO",
    "Longitude": -122.465,
    "Tide-Data": {
      "1854-07-01": 6909.0,
      "1854-08-01": 6940.0,
      "1854-09-01": 6961.0,
      "1854-10-01": 6952.0,
      "1854-11-01": 6952.0
    }
  }
]

PS if you don't care of idents you can write directly to JSON file:

(df.groupby(['ID','Location','Country','Latitude','Longitude'], as_index=False)
   .apply(lambda x: dict(zip(x.timestamp,x.tide)))
   .reset_index()
   .rename(columns={0:'Tide-Data'})
   .to_json('/path/to/file_name.json', orient='records'))

这篇关于将 Pandas Dataframe 转换为嵌套的 JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆