对pandas数据框的深度嵌套JSON响应 [英] Deeply nested JSON response to pandas dataframe

查看:70
本文介绍了对pandas数据框的深度嵌套JSON响应的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python/pandas的新手,在将嵌套的JSON转换为pandas数据框时遇到一些问题.我正在将查询发送到数据库并返回JSON字符串.

I am new to python/pandas and I am having some issues with converting a nested JSON to pandas dataframe. I am sending a query to a database and getting a JSON string back.

这是一个深度嵌套的JSON字符串,其中包含多个数组.来自数据库的响应包含数千行.这是JSON字符串中一行的一般结构:

It's a deeply nested JSON string that contains several arrays. The response from the database contains thousands of rows. Here is the general structure of one row in the JSON string:

{
  "ID": "123456",
  "profile": {
    "criteria": [
      {
        "type": "type1",
        "name": "name1",
        "value": "7",
        "properties": []
      },
      {
        "type": "type2",
        "name": "name2",
        "value": "6",
        "properties": [
          {
            "type": "MAX",
            "name": "",
            "value": "100"
          },
          {
            "type": "MIN",
            "name": "",
            "value": "5"
          }
        ]
      },
      {
        "type": "type3",
        "name": "name3",
        "value": "5",
        "properties": []
      }
    ]
  }
}  
{
  "ID": "456789",
  "profile": {
    "criteria": [
      {
        "type": "type4",
        "name": "name4",
        "value": "6",
        "properties": []
      }
    ]
  }
}

我想使用python pandas扁平化此JSON字符串.我在使用json_normalize时遇到问题,因为这是一个深度嵌套的JSON字符串:

I want to flatten this JSON string using python pandas. I had problems using json_normalize since this is a deeply nested JSON string:

from cassandra.cluster import Cluster
import pandas as pd
from pandas.io.json import json_normalize

def pandas_factory(colnames, rows):
    return pd.DataFrame(rows, columns=colnames)

cluster = Cluster(['xxx.xx.x.xx'], port=yyyy)
session = cluster.connect('nnnn')

session.row_factory = pandas_factory

json_string = session.execute('select json ......')
df = json_string ._current_rows
df_normalized= json_normalize(df)
print(df_normalized)

运行此代码时,出现键盘错误:

When i run this code, i get a Key error:

KeyError: 0

我需要帮助将此JSON字符串转换为只有某些选定列的数据框,看起来像这样:(其余数据可以跳过)

I need help converting this JSON string to a dataframe with only some selected columns that looks something like this: (The rest of the data can be skipped)

ID        |   criteria   | type   |   name   |   value   |

123456          1          type1      name1        7      
123456          2          type2      name2        6  
123456          3          type3      name3        5    
456789          1          type4      name4        6

我试图在此处找到类似的问题,但似乎无法将其应用于我的JSON字符串.

I tried to find similar problems on here but I can't seem to apply it to my JSON string.

感谢您的帮助! :)

reture的json字符串是查询响应对象:ResultSet .我认为这就是为什么我在使用时遇到一些问题:

The json string that is retured is a query response object: ResultSet . I think thats why I'm having some issues with using:

json_string= session.execute('select json profile from visning')
temp = json.loads(json_string)

并得到错误:

TypeError: the JSON object must be str, not 'ResultSet'

编辑#2:

只是看我正在使用什么,我使用以下命令打印了结果查询:

Just to see what I'm working with, I printed the the result query by using:

for line in session.execute('select json.....'):
    print(line)

并得到了这样的内容:

Row(json='{"ID": null, "profile": null}')
Row(json='{"ID": "123", "profile": {"criteria": [{"type": "type1", "name": "name1", "value": "10", "properties": []}, {"type": "type2", "name": "name2", "value": "50", "properties": []}, {"type": "type3", "name": "name3", "value": "40", "properties": []}]}}')
Row(json='{"ID": "456", "profile": {"criteria": []}}')
Row(json='{"ID": "789", "profile": {"criteria": [{"type": "type4", "name": "name4", "value": "5", "properties": []}]}}')
Row(json='{"ID": "987", "profile": {"criteria": [{"type": "type5", "name": "name5", "value": "70", "properties": []}, {"type": "type6", "name": "name6", "value": "60", "properties": []}, {"type": "type7", "name": "name7", "value": "2", "properties": []}, {"type": "type8", "name": "name8", "value": "7", "properties": []}]}}')

我遇到的问题是将此结构转换为可以在json.loads()中使用的json字符串:

The issue I'm having is converting this structure to a json string that can be used in json.loads():

json_string= session.execute('select json profile from visning')
json_list = list(json_string)
string= ''.join(list(map(str, json_list)))
temp = json.loads(string) <-- creates error json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

编辑#3:

按照下面注释中的要求进行打印

As requested below in the comments, printing

for line in session.execute('select json.....'):
print((line.json))

获取输出:

{"ID": null, "profile": null}
{"ID": "123", "profile": {"criteria": [{"type": "type1", "name": "name1", "value": "10", "properties": []}, {"type": "type2", "name": "name2", "value": "50", "properties": []}, {"type": "type3", "name": "name3", "value": "40", "properties": []}]}}
{"ID": "456", "profile": {"criteria": []}}
{"ID": "789", "profile": {"criteria": [{"type": "type4", "name": "name4", "value": "5", "properties": []}]}}
{"ID": "987", "profile": {"criteria": [{"type": "type5", "name": "name5", "value": "70", "properties": []}, {"type": "type6", "name": "name6", "value": "60", "properties": []}, {"type": "type7", "name": "name7", "value": "2", "properties": []}, {"type": "type8", "name": "name8", "value": "7", "properties": []}]}}

推荐答案

一个硬编码示例...

A hardcoded example...

import pandas as pd

temp = [{
  "ID": "123456",
  "profile": {
    "criteria": [
      {
        "type": "type1",
        "name": "name1",
        "value": "7",
        "properties": []
      },
      {
        "type": "type2",
        "name": "name2",
        "value": "6",
        "properties": [
          {
            "type": "MAX",
            "name": "",
            "value": "100"
          },
          {
            "type": "MIN",
            "name": "",
            "value": "5"
          }
        ]
      },
      {
        "type": "type3",
        "name": "name3",
        "value": "5",
        "properties": []
      }
    ]
  }
},
{
  "ID": "456789",
  "profile": {
    "criteria": [
      {
        "type": "type4",
        "name": "name4",
        "value": "6",
        "properties": []
      }
    ]
  }
}]

cols = ['ID', 'criteria', 'type', 'name', 'value']

rows = []
for data in temp:
    data_id = data['ID']
    criteria = data['profile']['criteria']
    for d in criteria:
        rows.append([data_id, criteria.index(d)+1, *list(d.values())[:-1]])

df = pd.DataFrame(rows, columns=cols)

这绝不是优雅.这更像是一种快速而肮脏的解决方案,因为我不知道JSON数据的确切格式-但是,根据您提供的内容,我上面的代码将生成所需的DataFrame.

This is by no means elegant. It is more of a quick and dirty solution, as I don't know how the JSON data is exactly formatted - however, based on what you've provided, my code above will produce the desired DataFrame.

       ID  criteria   type   name value
0  123456         1  type1  name1     7
1  123456         2  type2  name2     6
2  123456         3  type3  name3     5
3  456789         1  type4  name4     6

此外,如果您需要加载" JSON数据,则可以使用json库,如下所示:

Additionally, if you need to 'load' the JSON data, you can use the json library like so:

import json

temp = json.loads(json_string)

# Or from a file...
with open('some_json.json') as json_file:
    temp = json.load(json_file)

请注意json.loadsjson.load之间的区别.

这篇关于对pandas数据框的深度嵌套JSON响应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆