将 pandas 数据框拆分为子数据框列表的最快方法 [英] Fastest way to split a pandas dataframe into a list of subdataframes

查看:327
本文介绍了将 pandas 数据框拆分为子数据框列表的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大数据框df,为此我有一个df.index中唯一元素的完整列表indices.我现在想创建一个由indices中的元素索引的所有子数据帧的列表;特别地

I have a large dataframe df for which I have a full list indices of unique elements in df.index. I now want to create a list of all the subdataframes indexed by elements in indices; specifically

list_df = [df.loc[x] for x in indices]

尽管运行此命令要花一些时间(df大约有3e6行和3e3唯一索引).这是执行此操作的合理方法吗?我很高兴收到任何可以改善此问题和相关问题的性能的评论或建议.

Running this command is taking ages though (df has about 3e6 rows, and 3e3 unique indices). Is this a reasonable way to perform this operation? I would be very happy to receive any kind of comments or suggestions that could improve the performance of this and related problems.

提前谢谢!

推荐答案

您可以在

You can use list comprehension in groupby object by index - level=0, sort=False change default sorting for faster solution:

L = [x for i, x in df.groupby(level=0, sort=False)]


np.random.seed(123)
N = 1000
L = list('abcdefghijklmno')
df = pd.DataFrame({'A': np.random.choice(L, N),
                   'B':np.random.randint(10, size=N)}, index=np.random.randint(100, size=N))

In [273]: %timeit [x for i, x in df.groupby(level=0, sort=False)]
100 loops, best of 3: 9.91 ms per loop

In [274]: %timeit [df.loc[x] for x in df.index]
1 loop, best of 3: 417 ms per loop

这篇关于将 pandas 数据框拆分为子数据框列表的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆