用 pandas 将一列词典拆分/展开为单独的列 [英] Split / Explode a column of dictionaries into separate columns with pandas

查看:49
本文介绍了用 pandas 将一列词典拆分/展开为单独的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将数据保存在 postgreSQL 数据库中.我正在使用Python2.7查询此数据并将其转换为Pandas DataFrame.但是,此数据框的最后一列内部有一个值字典.DataFrame df 看起来像这样:

I have data saved in a postgreSQL database. I am querying this data using Python2.7 and turning it into a Pandas DataFrame. However, the last column of this dataframe has a dictionary of values inside it. The DataFrame df looks like this:

Station ID     Pollutants
8809           {"a": "46", "b": "3", "c": "12"}
8810           {"a": "36", "b": "5", "c": "8"}
8811           {"b": "2", "c": "7"}
8812           {"c": "11"}
8813           {"a": "82", "c": "15"}

我需要将此列拆分为单独的列,以便DataFrame`df2看起来像这样:

I need to split this column into separate columns, so that the DataFrame `df2 looks like this:

Station ID     a      b       c
8809           46     3       12
8810           36     5       8
8811           NaN    2       7
8812           NaN    NaN     11
8813           82     NaN     15

我遇到的主要问题是列表的长度不同.但是所有列表最多只能包含相同的3个值:"a","b"和"c" .而且它们始终以相同的顺序出现("a"首先,"b"第二,"c"第三).

The major issue I'm having is that the lists are not the same lengths. But all of the lists only contain up to the same 3 values: 'a', 'b', and 'c'. And they always appear in the same order ('a' first, 'b' second, 'c' third).

下面的代码用于工作并返回我想要的(df2).

The following code USED to work and return exactly what I wanted (df2).

objs = [df, pandas.DataFrame(df['Pollutant Levels'].tolist()).iloc[:, :3]]
df2 = pandas.concat(objs, axis=1).drop('Pollutant Levels', axis=1)
print(df2)

我刚刚在上周运行了这段代码,并且运行良好.但是现在我的代码坏了,我从第[4]行得到了这个错误:

I was running this code just last week and it was working fine. But now my code is broken and I get this error from line [4]:

IndexError: out-of-bounds on slice (end) 

我没有对代码进行任何更改,但是现在遇到了错误.我觉得这是由于我的方法不够健壮或不正确.

I made no changes to the code but am now getting the error. I feel this is due to my method not being robust or proper.

对于将列表的此列拆分为单独的列的任何建议或指导,将不胜感激!

Any suggestions or guidance on how to split this column of lists into separate columns would be super appreciated!

我认为 .tolist()和.apply方法不适用于我的代码,因为它是一个Unicode字符串,即:

I think the .tolist() and .apply methods are not working on my code because it is one Unicode string, i.e.:

#My data format 
u{'a': '1', 'b': '2', 'c': '3'}

#and not
{u'a': '1', u'b': '2', u'c': '3'}

以这种格式从 postgreSQL 数据库导入数据.对这个问题有任何帮助或想法吗?有没有办法转换Unicode?

The data is imported from the postgreSQL database in this format. Any help or ideas with this issue? is there a way to convert the Unicode?

推荐答案

要将字符串转换为实际字典,可以执行 df ['Pollutant Levels'].map(eval).之后,可以使用下面的解决方案将dict转换为不同的列.

To convert the string to an actual dict, you can do df['Pollutant Levels'].map(eval). Afterwards, the solution below can be used to convert the dict to different columns.

通过一个小例子,您可以使用 .apply(pd.Series):

Using a small example, you can use .apply(pd.Series):

In [2]: df = pd.DataFrame({'a':[1,2,3], 'b':[{'c':1}, {'d':3}, {'c':5, 'd':6}]})

In [3]: df
Out[3]:
   a                   b
0  1           {u'c': 1}
1  2           {u'd': 3}
2  3  {u'c': 5, u'd': 6}

In [4]: df['b'].apply(pd.Series)
Out[4]:
     c    d
0  1.0  NaN
1  NaN  3.0
2  5.0  6.0

要将其与数据框的其余部分组合在一起,您可以 concat 具有上述结果的其他列:

To combine it with the rest of the dataframe, you can concat the other columns with the above result:

In [7]: pd.concat([df.drop(['b'], axis=1), df['b'].apply(pd.Series)], axis=1)
Out[7]:
   a    c    d
0  1  1.0  NaN
1  2  NaN  3.0
2  3  5.0  6.0


使用您的代码,如果我忽略了 iloc 部分,这也可以使用:

In [15]: pd.concat([df.drop('b', axis=1), pd.DataFrame(df['b'].tolist())], axis=1)
Out[15]:
   a    c    d
0  1  1.0  NaN
1  2  NaN  3.0
2  3  5.0  6.0

这篇关于用 pandas 将一列词典拆分/展开为单独的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆