将字典的numpy ndarray转换为DataFrame [英] Converting numpy ndarray of dictionaries to DataFrame

查看:4461
本文介绍了将字典的numpy ndarray转换为DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在stackoverflow上搜索了解决方案->但所有解决方案都与我的需求稍有不同.

I've searched stackoverflow for a solution to this -> but all solutions are slightly different to my needs.

我有一个大的ndarray(大约1.07亿行),可以称之为df:

I have a large ndarray (roughly 107 million rows) lets call it df:

    [{'A': 5, 'C': 3, 'D': 3},
     {'A': 7, 'B': 9, 'F': 5},
     {'B': 4, 'C': 7, 'E': 6}]

我需要将其尽可能高效地转换为DataFrame.这是所需的示例输出:

I need it to be converted to a DataFrame as time efficiently as possible. This is an example desired output:

     A    B    C    D    E    F
0  5.0  NaN  3.0  3.0  NaN  NaN
1  7.0  9.0  NaN  NaN  NaN  5.0
2  NaN  4.0  7.0  NaN  6.0  NaN

我尝试了pd.DataFrame(df)pd.DataFrame.from_dict(df),但是这些给了我输出:

I have tried pd.DataFrame(df) and pd.DataFrame.from_dict(df) but these give me the output:

     0
0  {'A': 5, 'C': 3, 'D': 3}
1  {'A': 7, 'B': 9, 'F': 5}
2  {'B': 4, 'C': 7, 'E': 6}

问题:如何将df转换为所需的输出?

The question: How do I convert df to the desired output?

我尝试过anky_91的解决方案.这将适用于列表-而不是ndarray.我要避免转换为列表,因为列表中有1.07亿个值会导致内存错误.

I have tried anky_91's solution. This will work for a list - NOT an ndarray. I want to avoid converting to a list as having 107million values in a list causes memory errors.

pd.DataFrame(df).sort_index(axis=1)

这仍然给我与pd.DataFrame(df)相同的输出.它输出一个DataFrame,其中包含一列,每行中都有一个字典.

This still gives me the same output as pd.DataFrame(df). It outputs a DataFrame containing one column with dictionary in each row.

推荐答案

我认为输入数据不同:

L =  [[{'A': 5, 'C': 3, 'D': 3}],
     [{'A': 7, 'B': 9, 'F': 5}],
     [{'B': 4, 'C': 7, 'E': 6}]]

print (pd.DataFrame(L))
                          0
0  {'A': 5, 'C': 3, 'D': 3}
1  {'A': 7, 'B': 9, 'F': 5}
2  {'B': 4, 'C': 7, 'E': 6}

可能的解决方案是扁平化的:

Possible solution is flattening:

from  itertools import chain
df = pd.DataFrame(chain.from_iterable(L)).sort_index(axis=1)
print (df)
     A    B    C    D    E    F
0  5.0  NaN  3.0  3.0  NaN  NaN
1  7.0  9.0  NaN  NaN  NaN  5.0
2  NaN  4.0  7.0  NaN  6.0  NaN

如果输入数据为numpy数组,请使用@Code Different注释中的解决方案:

If input datais numpy array use solution from comment by @Code Different:

arr = np.array([{'A': 5, 'C': 3, 'D': 3},
                {'A': 7, 'B': 9, 'F': 5},
                {'B': 4, 'C': 7, 'E': 6}])

df = pd.DataFrame(arr.tolist()).sort_index(axis=1)
print (df)
     A    B    C    D    E    F
0  5.0  NaN  3.0  3.0  NaN  NaN
1  7.0  9.0  NaN  NaN  NaN  5.0
2  NaN  4.0  7.0  NaN  6.0  NaN

这篇关于将字典的numpy ndarray转换为DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆