pandas json_normalize失败,JSON中的值为空 [英] Pandas json_normalize fails with null values in JSON

查看:316
本文介绍了 pandas json_normalize失败,JSON中的值为空的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有下面的json,我是从外部网络服务获取的:

i have below json which i get from external webservice :

text="""
     [{
        "id":"1",
         "name" : "abc",
         "address":{
                    "flat":"123",
                    "city":"paris",
                    "street":null
         },
         "error":null
     }]

现在,我想从此json创建数据框.当我尝试以下内容时:

Now i want to create dataframe from this json. When i try below :

from pandas.io.json import json_normalize
import json
import pandas as pd

resp_json = json.loads(text)
response = json_normalize(resp_json)

但这给了我下面的错误:

But this gives me below error:

Error at response = json_normalize(resp_json) KeyError : 'street'

Error at response = json_normalize(resp_json) KeyError : 'street'

我相信这是因为street属性的值为null,这就是引发此错误的原因.该如何解决?

I believe its because street attribute has value as null thats why it is throwing this error. How can this be resolved?

如果我喜欢以下内容,我可以解决,但理想情况下,它不是正确的解决方案

If i do like below, I am able to resolve but ideally its not the right solution

text = text.replace('"street":null','"street":""')

注意:-当我使用python verion 3.6.3 :: Anaconda Inc.和pandas版本0.20.3时,我看不到此问题,并且json_normalize能够正常工作.这是我的本地计算机设置.

NOTE: - When I use python verion 3.6.3 :: Anaconda Inc. and pandas version 0.20.3 I do not see this issue and json_normalize is able to work properly. This is my local machine setup.

在生产机器上,我们有-Python-3.5.1和pandas 0.23.0.在那里,我们遇到了上述问题.

On production machine we have - Python - 3.5.1 and pandas 0.23.0. There we encounter above issue.

推荐答案

这似乎是最新版本的熊猫中的错误:

This appears to be a bug in the latest version of pandas:

https://github.com/pandas-dev/pandas/issues/21158

我正在运行熊猫"0.23.0",并且可以重现相同的错误. 您可以在github讨论线程中看到,当 null 值出现在嵌套级别大于0时,由于条件情况而产生错误.大约两个月前,它似乎已经更改,似乎已经改变了进入两周前的0.23.0版本:

I'm running pandas '0.23.0' and I can reproduce the same error. You can see in the github discussion thread that error arises due to condition case when null value occurs on the nesting level greater than 0. It seems to have been changed around two months ago that seems to have made it's way into 0.23.0 release two weeks ago:

https://github.com/pandas /commit/01882ba5b4c21b0caf2e6b9279fb01967aa5d650#diff-9c654764f5f21c8e9d58d9ebf14de86d

除了等待新版本或降级生产环境(这不是一个好主意,因为这很可能会破坏事情)之外,您还可以考虑如何在环境中处理多个软件包版本.除非您创建不同的虚拟环境,否则Pip无法做到这一点,我相信conda也不行.如果您确实需要加载诸如此类的文件,您可以做的是通过从git克隆它作为临时的,棘手的解决方案,将"0.22.0"包作为本地模块加载,仅用于加载字典.但是,当您使用0.22.0加载并尝试将其与0.23.0一起使用时,可能会出现一些数据框API不一致的情况.

Other than waiting for the new release or downgrading your production env (which is not a good idea, since it will quite likely break things), you could think of how to handle multiple package versions in your env. Pip is not capable of doing so unless you create different virtual environments, neither is conda I believe. What you could do, if you really need to load files like those, is to load the '0.22.0' package as a local module by cloning it from git as a temporary, hacky, solution - just to load your dict. But there might be some dataframe API inconsistencies when you load with 0.22.0 and try to use it with 0.23.0.

您的字符串转换解决方案可能毕竟还不错.

Your solution of converting strings might not be that bad after-all.

黑客很开心.

这篇关于 pandas json_normalize失败,JSON中的值为空的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆