迭代pandas数据帧的最快方法？ [英] Fastest way to iterate through a pandas dataframe?

查看：110 发布时间：2018/11/15 12:24:37 python database python-2.7 pandas ipython

本文介绍了迭代pandas数据帧的最快方法？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何运行数据框并仅返回符合特定条件的行？必须在先前的行和列上测试此条件。例如：

How do I run through a dataframe and return only the rows which meet a certain condition? This condition has to be tested on previous rows and columns. For example:

          #1    #2    #3    #4
1/1/1999   4     2     4     5
1/2/1999   5     2     3     3
1/3/1999   5     2     3     8
1/4/1999   6     4     2     6
1/5/1999   8     3     4     7
1/6/1999   3     2     3     8
1/7/1999   1     3     4     1

我想为每一行测试一些条件，如果所有条件都通过，我想将行追加到列表中。例如：

I could like to test a few conditions for each row and if all conditions are passed I would like to append the row to list. For example:

for row in dataframe:
    if [row-1, column 0] + [row-2, column 3] >= 6:
        append row to a list

我可能有最多3个条件，对于要返回的行必须为true。
考虑这样做的方法是为每个条件制作一个真实
的所有观察列表，然后为所有三个列表中出现的所有行创建一个单独的列表。

I may have up to 3 conditions which must be true for the row to be returned. The way am thinking about doing it is by making a list for all the observations which are true for each condition, and then making a separate list for all of the rows that appear in all three lists.

我的两个问题如下：

获得满足所有行的最快方法是什么基于前一行的特定条件？循环遍历5,000行的数据帧似乎可能太长了。特别是如果必须测试可能的3个条件。

What is the fastest way to get all of the rows that meet a certain condition based on previous rows? Looping through a dataframe of 5,000 rows seems like it may be too long. Especially if potentially 3 conditions have to be tested.

获得满足所有3个条件的行列表的最佳方法是什么？

What is the best way to get a list of rows which meet all 3 conditions?

推荐答案

选择行的最快方法是不迭代数据帧的行。相反，为要选择的行创建一个具有True值的掩码（布尔数组），然后调用 df [mask] 来选择它们：

The quickest way to select rows is to not iterate through the rows of the dataframe. Instead, create a mask (boolean array) with True values for the rows you wish to select, and then call df[mask] to select them:

mask = (df['column 0'].shift(1) + df['column 3'].shift(2) >= 6)
newdf = df[mask]

要将多个条件与逻辑组合使用，请使用& ：

mask = ((...) & (...))

对于逻辑 - 或者使用 | ：

For logical-or use |:

mask = ((...) | (...))

例如，

For example,

In [75]: df = pd.DataFrame({'A':range(5), 'B':range(10,20,2)})

In [76]: df
Out[76]: 
   A   B
0  0  10
1  1  12
2  2  14
3  3  16
4  4  18

In [77]: mask = (df['A'].shift(1) + df['B'].shift(2) > 12)

In [78]: mask
Out[78]: 
0    False
1    False
2    False
3     True
4     True
dtype: bool

In [79]: df[mask]
Out[79]: 
   A   B
3  3  16
4  4  18

这篇关于迭代pandas数据帧的最快方法？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

迭代pandas数据帧的最快方法？ [英] Fastest way to iterate through a pandas dataframe?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

迭代pandas数据帧的最快方法？ [英] Fastest way to iterate through a pandas dataframe?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭