从python数据框列中删除非json对象行 [英] drop non-json object rows from python dataframe column
问题描述
我有一个数据框,因此该列同时包含json对象和字符串.我想摆脱不包含json对象的行.
I have a dataframe such that the column contains both json objects and strings. I want to get rid of rows that does not contains json objects.
下面是我的数据框的外观:
Below is how my dataframe looks like :
import pandas as pd
df = pd.DataFrame({'A': ["hello","world",{"a":5,"b":6,"c":8},"usa","india",{"a":9,"b":10,"c":11}]})
print(df)
我应该如何删除仅包含字符串的行,以便在删除这些字符串行之后,我可以将以下内容应用于此列,以将json对象转换为数据帧的单独列:
How should i remove the rows that contains only strings, so that after removing those string rows, I can apply below to this column to convert json object into separate columns of dataframe:
from pandas.io.json import json_normalize
df = json_normalize(df['A'])
print(df)
推荐答案
我想我更喜欢使用isinstance
检查:
I think I would prefer to use an isinstance
check:
In [11]: df.loc[df.A.apply(lambda d: isinstance(d, dict))]
Out[11]:
A
2 {'a': 5, 'b': 6, 'c': 8}
5 {'d': 9, 'e': 10, 'f': 11}
如果您也想包含数字,则可以执行以下操作:
If you want to include numbers too, you can do:
In [12]: df.loc[df.A.apply(lambda d: isinstance(d, (dict, np.number)))]
Out[12]:
A
2 {'a': 5, 'b': 6, 'c': 8}
5 {'d': 9, 'e': 10, 'f': 11}
将此调整为您要包括的任何类型...
Adjust this to whichever types you want to include...
最后一步,json_normalize获取一个json对象列表,无论出于什么原因Series不好(并给出KeyError),您都可以将其列为列表,然后按照您的意愿进行操作:
The last step, json_normalize takes a list of json objects, for whatever reason a Series is no good (and gives the KeyError), you can make this a list and your good to go:
In [21]: df1 = df.loc[df.A.apply(lambda d: isinstance(d, (dict, np.number)))]
In [22]: json_normalize(list(df1["A"]))
Out[22]:
a b c d e f
0 5.0 6.0 8.0 NaN NaN NaN
1 NaN NaN NaN 9.0 10.0 11.0
这篇关于从python数据框列中删除非json对象行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!