从python数据框列中删除非json对象行 [英] drop non-json object rows from python dataframe column

查看:116
本文介绍了从python数据框列中删除非json对象行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,因此该列同时包含json对象和字符串.我想摆脱不包含json对象的行.

I have a dataframe such that the column contains both json objects and strings. I want to get rid of rows that does not contains json objects.

下面是我的数据框的外观:

Below is how my dataframe looks like :

import pandas as pd

df = pd.DataFrame({'A': ["hello","world",{"a":5,"b":6,"c":8},"usa","india",{"a":9,"b":10,"c":11}]})

print(df)

我应该如何删除仅包含字符串的行,以便在删除这些字符串行之后,我可以将以下内容应用于此列,以将json对象转换为数据帧的单独列:

How should i remove the rows that contains only strings, so that after removing those string rows, I can apply below to this column to convert json object into separate columns of dataframe:

from pandas.io.json import json_normalize
df = json_normalize(df['A'])
print(df)

推荐答案

我想我更喜欢使用isinstance检查:

I think I would prefer to use an isinstance check:

In [11]: df.loc[df.A.apply(lambda d: isinstance(d, dict))]
Out[11]:
                            A
2    {'a': 5, 'b': 6, 'c': 8}
5  {'d': 9, 'e': 10, 'f': 11}

如果您也想包含数字,则可以执行以下操作:

If you want to include numbers too, you can do:

In [12]: df.loc[df.A.apply(lambda d: isinstance(d, (dict, np.number)))]
Out[12]:
                            A
2    {'a': 5, 'b': 6, 'c': 8}
5  {'d': 9, 'e': 10, 'f': 11}

将此调整为您要包括的任何类型...

Adjust this to whichever types you want to include...

最后一步,json_normalize获取一个json对象列表,无论出于什么原因Series不好(并给出KeyError),您都可以将其列为列表,然后按照您的意愿进行操作:

The last step, json_normalize takes a list of json objects, for whatever reason a Series is no good (and gives the KeyError), you can make this a list and your good to go:

In [21]: df1 = df.loc[df.A.apply(lambda d: isinstance(d, (dict, np.number)))]

In [22]: json_normalize(list(df1["A"]))
Out[22]:
     a    b    c    d     e     f
0  5.0  6.0  8.0  NaN   NaN   NaN
1  NaN  NaN  NaN  9.0  10.0  11.0

这篇关于从python数据框列中删除非json对象行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆