对于“大型"数据，在 pandas 数据帧中的过滤速度较慢.组数? [英] Filtering in Pandas dataframe slow for "large" number of groups?

查看：66 发布时间：2020/5/24 2:56:52 python pandas

本文介绍了对于“大型"数据，在 pandas 数据帧中的过滤速度较慢.组数?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个约有20万行的数据框，我尝试按以下方式进行过滤:

I have a dataframe with about 200k rows, which I'm trying to filter as follows:

>>> df.groupby(key).filter(lambda group: len(group) > 100)

其中key是列的列表.当指定的键将数据帧分为800个左右的组时，此过程将在3秒钟内运行.但是，如果我在键中添加另一列，将组数增加到2500个左右，那么执行将占用我的所有内存，并且除非我终止脚本，否则基本上会使系统崩溃.

where key is a list of columns. This runs in about 3 seconds when the key specified divides the dataframe into 800 or so groups. However, if I add another column to the key, increasing the number of groups to around 2500, the execution sucks up all my memory and basically crashes my system unless I terminate the script.

我可以通过遍历各个组来执行相同的操作，但是与上述单行代码相比，它很笨拙，这使我想知道为什么过滤器功能如此有限.

I can do the same by iterating over the groups, but it's clumsy compared to the above one-liner, and makes me wonder why the filter function is so limited.

有人可以向我解释一下这是否可以预期吗?如果可以，为什么?

Could someone please explain to me if this is to be expected, and if so why?

谢谢！

对于“大型"数据，在 pandas 数据帧中的过滤速度较慢.组数? [英] Filtering in Pandas dataframe slow for "large" number of groups?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

对于“大型"数据，在 pandas 数据帧中的过滤速度较慢.组数? [英] Filtering in Pandas dataframe slow for &quot;large&quot; number of groups?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

对于“大型"数据，在 pandas 数据帧中的过滤速度较慢.组数? [英] Filtering in Pandas dataframe slow for "large" number of groups?

登录关闭