如何使用NaNs json_normalize列 [英] How to json_normalize a column with NaNs
本文介绍了如何使用NaNs json_normalize列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
- 此问题特定于
pandas.DataFrame
中的数据列
- 这个问题取决于列中的值是
str
,dict
还是list
类型. - 当
df.dropna().reset_index(drop=True)
不是有效选项时,此问题解决了处理NaN
值的问题.
- This question is specific to columns of data in a
pandas.DataFrame
- This question depends on if the values in the columns are
str
,dict
, orlist
type. - This question addresses dealing with the
NaN
values, whendf.dropna().reset_index(drop=True)
isn't a valid option.
- 对于具有
str
类型的列,必须在使用.json_normalize
之前将该列中的值转换为具有ast.literal_eval
的dict
类型.
- With a column of
str
type, the values in the column must be converted todict
type, withast.literal_eval
, before using.json_normalize
.
import numpy as np
import pandas as pd
from ast import literal_eval
df = pd.DataFrame({'col_str': ['{"a": "46", "b": "3", "c": "12"}', '{"b": "2", "c": "7"}', '{"c": "11"}', np.NaN]})
col_str
0 {"a": "46", "b": "3", "c": "12"}
1 {"b": "2", "c": "7"}
2 {"c": "11"}
3 NaN
type(df.iloc[0, 0])
[out]: str
df.col_str.apply(literal_eval)
错误:
df.col_str.apply(literal_eval) results in ValueError: malformed node or string: nan
案例2
- 对于
dict
类型的列,请使用pandas.json_normalize
将键转换为列标题,将值转换为行 - With a column of
dict
type, usepandas.json_normalize
to convert keys to column headers and values to rows
Case 2
df = pd.DataFrame({'col_dict': [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}, {"c": "11"}, np.NaN]})
col_dict
0 {'a': '46', 'b': '3', 'c': '12'}
1 {'b': '2', 'c': '7'}
2 {'c': '11'}
3 NaN
type(df.iloc[0, 0])
[out]: dict
pd.json_normalize(df.col_dict)
错误:
pd.json_normalize(df.col_dict) results in AttributeError: 'float' object has no attribute 'items'
案例3
- 在
str
类型的列中,dict
放在list
内. - 标准化列
- 应用
literal_eval
,因为爆炸不适用于str
类型 - 展开列以将
dicts
分隔为单独的行 - 标准化列
- In a column of
str
type, with thedict
inside alist
. - To normalize the column
- apply
literal_eval
, because explode doesn't work onstr
type - explode the column to separate the
dicts
to separate rows - normalize the column
df = pd.DataFrame({'col_str': ['[{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}]', '[{"b": "2", "c": "7"}, {"c": "11"}]', np.nan]}) col_str 0 [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}] 1 [{"b": "2", "c": "7"}, {"c": "11"}] 2 NaN type(df.iloc[0, 0]) [out]: str df.col_str.apply(literal_eval)
错误:
df.col_str.apply(literal_eval) results in ValueError: malformed node or string: nan
推荐答案
- 正如评论中指出的那样,始终可以选择执行以下操作:
-
df = df.dropna().reset_index(drop=True)
- 对于这里的虚拟数据,或者在处理其他列无关紧要的数据帧时,都很好.
- 对于带有附加列的数据框来说,不是一个很好的选择.
- As pointed out in a comment, there is always the option to:
df = df.dropna().reset_index(drop=True)
- That's fine for the dummy data here, or when dealing with a dataframe where the other columns don't matter.
- Not a great option for dataframes with additional columns that are required.
- 由于该列包含
str
类型,所以fillna用'{}'
(astr
) - Since the column contains
str
types, fillna with'{}'
(astr
)
import numpy as np import pandas as pd from ast import literal_eval df = pd.DataFrame({'col_str': ['{"a": "46", "b": "3", "c": "12"}', '{"b": "2", "c": "7"}', '{"c": "11"}', np.NaN]}) col_str 0 {"a": "46", "b": "3", "c": "12"} 1 {"b": "2", "c": "7"} 2 {"c": "11"} 3 NaN type(df.iloc[0, 0]) [out]: str # fillna df.col_str = df.col_str.fillna('{}') # convert the column to dicts df.col_str = df.col_str.apply(literal_eval) # use json_normalize df = df.join(pd.json_normalize(df.col_str)).drop(columns=['col_str']) # display(df) a b c 0 46 3 12 1 NaN 2 7 2 NaN NaN 11 3 NaN NaN NaN
案例2
- 由于该列包含
dict
类型,所以fillna用{}
(不是str
) - 由于
fillna({})
无法正常工作,因此需要使用dict-comprehension来填充 - Since the column contains
dict
types, fillna with{}
(not astr
) - This needs to be filled using a dict-comprehension, since
fillna({})
does not work
Case 2
df = pd.DataFrame({'col_dict': [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}, {"c": "11"}, np.NaN]}) col_dict 0 {'a': '46', 'b': '3', 'c': '12'} 1 {'b': '2', 'c': '7'} 2 {'c': '11'} 3 NaN type(df.iloc[0, 0]) [out]: dict # fillna df.col_dict = df.col_dict.fillna({i: {} for i in df.index}) # use json_normalize df = df.join(pd.json_normalize(df.col_dict)).drop(columns=['col_dict']) # display(df) a b c 0 46 3 12 1 NaN 2 7 2 NaN NaN 11 3 NaN NaN NaN
案例3
- 用
'[]'
(astr
)填充NaNs
- 现在
literal_eval
将起作用
可以在列上使用 -
.explode
将dict
值分隔为行 - 现在
NaNs
需要用{}
(不是str
)填充 - 然后可以对列进行规范化
- Fill the
NaNs
with'[]'
(astr
) - Now
literal_eval
will work .explode
can be used on the column to separate thedict
values to rows- Now the
NaNs
need to be filled with{}
(not astr
) - Then the column can be normalized
- 对于列不是
dicts
的lists
的情况,请跳到.explode
. - For the case when the column is
lists
ofdicts
, that aren'tstr
type, skip to.explode
.
df = pd.DataFrame({'col_str': ['[{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}]', '[{"b": "2", "c": "7"}, {"c": "11"}]', np.nan]}) col_str 0 [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}] 1 [{"b": "2", "c": "7"}, {"c": "11"}] 2 NaN type(df.iloc[0, 0]) [out]: str # fillna df.col_str = df.col_str.fillna('[]') # literal_eval df.col_str = df.col_str.apply(literal_eval) # explode df = df.explode('col_str').reset_index(drop=True) # fillna again df.col_str = df.col_str.fillna({i: {} for i in df.index}) # use json_normalize df = df.join(pd.json_normalize(df.col_str)).drop(columns=['col_str']) # display(df) a b c 0 46 3 12 1 NaN 2 7 2 NaN 2 7 3 NaN NaN 11 4 NaN NaN NaN
这篇关于如何使用NaNs json_normalize列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-
- apply
Case 3
- 应用
查看全文