如何使用NaNs json_normalize列 [英] How to json_normalize a column with NaNs

查看：68 发布时间：2021/2/13 19:57:45 python json pandas list dictionary

本文介绍了如何使用NaNs json_normalize列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

此问题特定于pandas.DataFrame
这个问题取决于列中的值是str，dict还是list类型.
当df.dropna().reset_index(drop=True)不是有效选项时，此问题解决了处理NaN值的问题.

This question is specific to columns of data in a pandas.DataFrame
This question depends on if the values in the columns are str, dict, or list type.
This question addresses dealing with the NaN values, when df.dropna().reset_index(drop=True) isn't a valid option.

对于具有str类型的列，必须在使用.json_normalize之前将该列中的值转换为具有ast.literal_eval的dict类型.

With a column of str type, the values in the column must be converted to dict type, with ast.literal_eval, before using .json_normalize.

import numpy as np
import pandas as pd
from ast import literal_eval

df = pd.DataFrame({'col_str': ['{"a": "46", "b": "3", "c": "12"}', '{"b": "2", "c": "7"}', '{"c": "11"}', np.NaN]})

                            col_str
0  {"a": "46", "b": "3", "c": "12"}
1              {"b": "2", "c": "7"}
2                       {"c": "11"}
3                               NaN

type(df.iloc[0, 0])
[out]: str

df.col_str.apply(literal_eval)

错误:

df.col_str.apply(literal_eval) results in ValueError: malformed node or string: nan

案例2

对于dict类型的列，请使用pandas.json_normalize将键转换为列标题，将值转换为行

Case 2

With a column of dict type, use pandas.json_normalize to convert keys to column headers and values to rows

df = pd.DataFrame({'col_dict': [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}, {"c": "11"}, np.NaN]})

                           col_dict
0  {'a': '46', 'b': '3', 'c': '12'}
1              {'b': '2', 'c': '7'}
2                       {'c': '11'}
3                               NaN

type(df.iloc[0, 0])
[out]: dict

pd.json_normalize(df.col_dict)

错误:

pd.json_normalize(df.col_dict) results in AttributeError: 'float' object has no attribute 'items'

案例3

在str类型的列中，dict放在list内.

标准化列

应用literal_eval，因为爆炸不适用于str类型
展开列以将dicts分隔为单独的行
标准化列

Case 3

In a column of str type, with the dict inside a list.

To normalize the column

apply literal_eval, because explode doesn't work on str type
explode the column to separate the dicts to separate rows
normalize the column

df = pd.DataFrame({'col_str': ['[{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}]', '[{"b": "2", "c": "7"}, {"c": "11"}]', np.nan]})

                                                    col_str
0  [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}]
1                       [{"b": "2", "c": "7"}, {"c": "11"}]
2                                                       NaN

type(df.iloc[0, 0])
[out]: str
    
df.col_str.apply(literal_eval)

错误:

df.col_str.apply(literal_eval) results in ValueError: malformed node or string: nan

案例2

由于该列包含dict类型，所以fillna用{}(不是str)
由于fillna({})无法正常工作，因此需要使用dict-comprehension来填充

Case 2

Since the column contains dict types, fillna with {} (not a str)
This needs to be filled using a dict-comprehension, since fillna({}) does not work

df = pd.DataFrame({'col_dict': [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}, {"c": "11"}, np.NaN]})

                           col_dict
0  {'a': '46', 'b': '3', 'c': '12'}
1              {'b': '2', 'c': '7'}
2                       {'c': '11'}
3                               NaN

type(df.iloc[0, 0])
[out]: dict
    
# fillna
df.col_dict = df.col_dict.fillna({i: {} for i in df.index})

# use json_normalize
df = df.join(pd.json_normalize(df.col_dict)).drop(columns=['col_dict'])

# display(df)
     a    b    c
0   46    3   12
1  NaN    2    7
2  NaN  NaN   11
3  NaN  NaN  NaN

案例3

用'[]'(a str)填充NaNs
现在literal_eval将起作用
.explode将dict值分隔为行
现在NaNs需要用{}(不是str)填充
然后可以对列进行规范化

Fill the NaNs with '[]' (a str)
Now literal_eval will work
.explode can be used on the column to separate the dict values to rows
Now the NaNs need to be filled with {} (not a str)
Then the column can be normalized

对于列不是dicts的lists的情况，请跳到.explode.

For the case when the column is lists of dicts, that aren't str type, skip to .explode.

df = pd.DataFrame({'col_str': ['[{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}]', '[{"b": "2", "c": "7"}, {"c": "11"}]', np.nan]})

                                                    col_str
0  [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}]
1                       [{"b": "2", "c": "7"}, {"c": "11"}]
2                                                       NaN

type(df.iloc[0, 0])
[out]: str
    
# fillna
df.col_str = df.col_str.fillna('[]')

# literal_eval
df.col_str = df.col_str.apply(literal_eval)

# explode
df = df.explode('col_str').reset_index(drop=True)

# fillna again
df.col_str = df.col_str.fillna({i: {} for i in df.index})

# use json_normalize
df = df.join(pd.json_normalize(df.col_str)).drop(columns=['col_str'])

# display(df)
     a    b    c
0   46    3   12
1  NaN    2    7
2  NaN    2    7
3  NaN  NaN   11
4  NaN  NaN  NaN

这篇关于如何使用NaNs json_normalize列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用NaNs json_normalize列 [英] How to json_normalize a column with NaNs

问题描述

案例2

Case 2

案例3

Case 3

推荐答案

案例2

Case 2

案例3

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用NaNs json_normalize列 [英] How to json_normalize a column with NaNs

问题描述

案例2

Case 2

案例3

Case 3

推荐答案

案例2

Case 2

案例3

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭