快速 pandas 过滤 [英] Fast pandas filtering

查看：154 发布时间：2020/5/24 2:02:42 python pandas

本文介绍了快速 pandas 过滤的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果名称列条目在给定列表中有一个项目，我想过滤一个熊猫数据框.

I want to filter a pandas dataframe, if the name column entry has an item in a given list.

这里有一个DataFrame

Here we have a DataFrame

x = DataFrame(
    [['sam', 328], ['ruby', 3213], ['jon', 121]], 
    columns=['name', 'score'])

现在让我们说我们有一个列表，['sam', 'ruby']，我们想找到列表中名称所在的所有行，然后对分数求和.

Now lets say we have a list, ['sam', 'ruby'] and we want to find all rows where the name is in the list, then sum the score.

我的解决方法如下:

total = 0
names = ['sam', 'ruby']
for name in names:
     identified = x[x['name'] == name]
     total = total + sum(identified['score'])

但是，当数据帧变得非常大，并且名称列表也变得非常大时，一切都会非常缓慢.

However when the dataframe gets extremely large, and the list of names gets very large too, everything is very very slow.

有没有更快的选择?

谢谢

推荐答案

尝试使用 isin (感谢DSM在这里建议loc而不是ix):

Try using isin (thanks to DSM for suggesting loc over ix here):

In [78]: x = pd.DataFrame([['sam',328],['ruby',3213],['jon',121]], columns = ['name', 'score'])

In [79]: names = ['sam', 'ruby']

In [80]: x['name'].isin(names)
Out[80]: 
0     True
1     True
2    False
Name: name, dtype: bool

In [81]: x.loc[x['name'].isin(names), 'score'].sum()
Out[81]: 3541

Zhu Zhu建议使用np.in1d更快的替代方法:

CT Zhu suggests a faster alternative using np.in1d:

In [105]: y = pd.concat([x]*1000)
In [109]: %timeit y.loc[y['name'].isin(names), 'score'].sum()
1000 loops, best of 3: 413 µs per loop

In [110]: %timeit y.loc[np.in1d(y['name'], names), 'score'].sum()
1000 loops, best of 3: 335 µs per loop

这篇关于快速 pandas 过滤的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

快速 pandas 过滤 [英] Fast pandas filtering

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

快速 pandas 过滤 [英] Fast pandas filtering

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭