将Pandas DataFrame制作为字典和dropna [英] make pandas DataFrame to a dict and dropna

查看:81
本文介绍了将Pandas DataFrame制作为字典和dropna的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些带有NaN的熊猫DataFrame. 像这样:

I have some pandas DataFrame with NaNs in it. Like this:

import pandas as pd
import numpy as np
raw_data={'A':{1:2,2:3,3:4},'B':{1:np.nan,2:44,3:np.nan}}
data=pd.DataFrame(raw_data)
>>> data
   A   B
1  2 NaN
2  3  44
3  4 NaN

现在,我要根据它做出命令,同时删除NaN. 结果应如下所示:

Now I want to make a dict out of it and at the same time remove the NaNs. The result should look like this:

{'A': {1: 2, 2: 3, 3: 4}, 'B': {2: 44.0}}

但是使用pandas to_dict函数可以得到如下结果:

But using pandas to_dict function gives me a result like this:

>>> data.to_dict()
{'A': {1: 2, 2: 3, 3: 4}, 'B': {1: nan, 2: 44.0, 3: nan}} 

那么如何从DataFrame中做出命令并摆脱NaN?

So how to make a dict out of the DataFrame and get rid of the NaNs ?

推荐答案

有很多方法可以实现此目的,我花了一些时间在一个不太大的(70k)数据帧上评估性能.尽管@der_die_das_jojo的答案可以起作用,但它的运行速度也很慢.

There are many ways you could accomplish this, I spent some time evaluating performance on a not-so-large (70k) dataframe. Although @der_die_das_jojo's answer is functional, it's also pretty slow.

实际上,这个问题在大型数据帧上的速度提高了约5倍.

The answer suggested by this question actually turns out to be about 5x faster on a large dataframe.

在我的测试数据帧(df)上:

On my test dataframe (df):

以上方法:

%time [ v.dropna().to_dict() for k,v in df.iterrows() ]
CPU times: user 51.2 s, sys: 0 ns, total: 51.2 s
Wall time: 50.9 s

另一种慢速方法:

%time df.apply(lambda x: [x.dropna()], axis=1).to_dict(orient='rows')
CPU times: user 1min 8s, sys: 880 ms, total: 1min 8s
Wall time: 1min 8s

我能找到的最快方法:

%time [ {k:v for k,v in m.items() if pd.notnull(v)} for m in df.to_dict(orient='rows')]
CPU times: user 14.5 s, sys: 176 ms, total: 14.7 s
Wall time: 14.7 s

此输出的格式是面向行的字典,如果您要在问题中使用面向列的形式,则可能需要进行调整.

The format of this output is a row-oriented dictionary, you may need to make adjustments if you want the column-oriented form in the question.

如果有人能找到一个更快的答案对此非常感兴趣.

Very interested if anyone finds an even faster answer to this question.

这篇关于将Pandas DataFrame制作为字典和dropna的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆