在'n'下一行迭代Pandas数据帧 [英] Iterating a Pandas dataframe over 'n' next rows

查看：162 发布时间：2018/11/15 23:11:07 python loops pandas iterator

本文介绍了在'n'下一行迭代Pandas数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有这个Pandas数据框 df ：

I have this Pandas dataframe df:

station a_d direction
   a     0      0
   a     0      0
   a     1      0
   a     0      0
   a     1      0
   b     0      0
   b     1      0
   c     0      0
   c     1      0
   c     0      1
   c     1      1
   b     0      1
   b     1      1
   b     0      1
   b     1      1
   a     0      1
   a     1      1
   a     0      0
   a     1      0

我指定一个value_id当方向值改变时递增，并且仅指最后一对站值，它首先以不同的[0,1] a_d值改变。我可以忽略最后一个（在这个例子中是最后两个）数据帧行。换句话说：

I'd assign a value_id that increments when direction value change and refers only to the last pair of station value first it changes with different [0,1] a_d value. I can ignore the last (in this example the last two) dataframe row. In other words:

station a_d direction id_value
   a     0      0
   a     0      0
   a     1      0
   a     0      0        0
   a     1      0        0
   b     0      0        0
   b     1      0        0
   c     0      0        0
   c     1      0        0
   c     0      1        1
   c     1      1        1
   b     0      1         
   b     1      1        
   b     0      1        1
   b     1      1        1
   a     0      1        1
   a     1      1        1
   a     0      0
   a     1      0

使用 df.iterrows（）我写这个脚本：

df['value_id'] = ""
value_id = 0
row_iterator = df.iterrows()
for i, row in row_iterator:
    if i == 0:
        continue
    elif (df.loc[i-1,'direction'] != df.loc [i,'direction']):
        value_id += 1
    for z in range(1,11):
        if i+z >= len(df)-1:
            break
        elif (df.loc[i+1,'a_d'] == df.loc [i,'a_d']):
            break
        elif (df.loc[i+1,'a_d'] != df.loc [i,'a_d']) and (df.loc [i+2,'station'] == df.loc [i,'station'] and (df.loc [i+2,'direction'] == df.loc [i,'direction'])):
            break
        else:
            df.loc[i,'value_id'] = value_id

它有效，但速度很慢。使用 10 * 10 ^ 6 行数据帧，我需要更快的方法。有什么想法吗？

It works but it's very slow. With a 10*10^6 rows dataframe I need a faster way. Any idea?

@ user5402代码效果很好，但我注意到在最后一个中断 > else 还减少计算时间：

@user5402 code works well but I note that a break after the last else reduce computational time also:

df['value_id'] = ""
value_id = 0
row_iterator = df.iterrows()
for i, row in row_iterator:
    if i == 0:
        continue
    elif (df.loc[i-1,'direction'] != df.loc [i,'direction']):
        value_id += 1
    for z in range(1,11):
        if i+z >= len(df)-1:
            break
        elif (df.loc[i+1,'a_d'] == df.loc [i,'a_d']):
            break
        elif (df.loc[i+1,'a_d'] != df.loc [i,'a_d']) and (df.loc [i+2,'station'] == df.loc [i,'station'] and (df.loc [i+2,'direction'] == df.loc [i,'direction'])):
            break
        else:
            df.loc[i,'value_id'] = value_id
            break

推荐答案

你在内部for循环中没有有效地使用 z 。您永远不会访问 i + z -th行。您访问第i行和 i + 1 -th行和 i + 2 -th行，但绝不是 i + z -th行。

You are not effectively using z in the inner for loop. You never access the i+z-th row. You access the i-th row and the i+1-th row and the i+2-th row, but never the i+z-th row.

您可以用以下内容替换内部for循环：

You can replace that inner for loop with:

  if i+1 > len(df)-1:
    pass
  elif (df.loc[i+1,'a_d'] == df.loc [i,'a_d']):
    pass
  elif (df.loc [i+2,'station'] == df.loc [i,'station'] and (df.loc [i+2,'direction'] == df.loc [i,'direction'])):
    pass
  else:
    df.loc[i,'value_id'] = value_id

请注意，我还略微优化了第二个 elif ，因为此时你已经知道 df .loc [i + 1，'a_d'] 不等于 df.loc [i，'a_d'] 。

Note that I also slightly optimized the second elif because at that point you already know df.loc[i+1,'a_d'] does not equal df.loc [i,'a_d'].

无需循环 z 将节省大量时间。

Not having to loop over z will save a lot of time.

这篇关于在'n'下一行迭代Pandas数据帧的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在'n'下一行迭代Pandas数据帧 [英] Iterating a Pandas dataframe over 'n' next rows

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在'n'下一行迭代Pandas数据帧 [英] Iterating a Pandas dataframe over &#39;n&#39; next rows

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

在'n'下一行迭代Pandas数据帧 [英] Iterating a Pandas dataframe over 'n' next rows

登录关闭