python中的模数运算来迭代一个 pandas 数据框 [英] Modular arithmetic in python to iterate a pandas dataframe

查看:287
本文介绍了python中的模数运算来迭代一个 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,我有一个很大的数据框,如:

 小时值
0 0 1
1 6 2
2 12 3
3 18 4
4 0 5
5 6 6
6 12 7
7 18 8
8 6 9
9 12 10
10 18 11
11 12 12
12 18 13
13 0 14
pre>

让我们不要迷失在这里。 小时列表示当天的时间,从6到6小时。列很好,这里的值就是一个例子,而不是实际的值。



如果您仔细观察小时列,您可以看到有几个小时缺少。例如,行7和8之间存在间隙(缺少小时0的值)。还有更大的差距,例如第10到11行(00和06小时)。



我需要什么?喜欢检查一个小时(当然)是否缺少一个值,并完成数据帧,在那里插入相应的小时,一个 np.nan 作为值。 / p>

我想到了什么?我认为这可以使用模块化算法轻松解决,在这种情况下,使用mod 24 ,例如当 18 + 6 = 24 = 0 mod 24 时。因此,将计数器初始化为零,并在计算器在模数运算模块24中定义的警告中添加6 ,您可以验证每个小时是否为相应的小时,如果没有,请插入一个新的行,相应的小时和 np.nan 作为值。



我不知道如何在python中执行模数运算来迭代数据框列。



非常感谢。

解决方案

解决方案



  group_hours =(df.hour& = df.hour.shift())。cumsum()

def insert_missing_hours(df):
return df.set_index('hour')。reindex([0,6,12, 18])。reset_index()

df.groupby(group_hours).apply(insert_missing_hours).reset_index(drop = 1)

看起来像:

 小时值
0 0 1.0
1 6 2.0
2 12 3.0
3 18 4.0
4 0 5.0
5 6 6.0
6 12 7.0
7 18 8.0
8 0 NaN
9 6 9.0
10 12 10.0
11 18 11.0
12 0 NaN
13 6 NaN
14 12 12.0
15 18 13.0
16 0 14.0
17 6 NaN
18 12 NaN
19 18 NaN



说明



为了应用 reindex ,我需要确定要分组的行。我检查了行的小时是否小于或等于上一行的小时数。如果是这样,那么标记一个新的组。



insert_missing_hours 正是 reindex 具有 [0,6,12,18] 的子组


Ok, I have a big dataframe such as:

      hour    value
  0      0      1
  1      6      2
  2     12      3
  3     18      4
  4      0      5
  5      6      6
  6     12      7
  7     18      8
  8      6      9
  9     12     10
 10     18     11
 11     12     12
 12     18     13
 13      0     14

Let's don't get lost here. The column hour represents the hours of the day, from 6 to 6 hours. Column values is well, exactly that, here the values are as an example, not the actual ones.

If you look closely to the hour column, you can see that there are hours missing. For instance, there is a gap between rows 7 and 8 (the value of hour 0 is missing). There are also bigger gaps, such as in between rows 10 and 11 (hours 00 and 06).

What do I need? I would like to check when an hour (and of course) a value is missing, and complete the dataframe inserting a row there with the corresponding hour and a np.nan as value.

What have I thought? I think this would be easily solved using modular arithmetic, in this case with mod 24, such as when 18 + 6 = 24 = 0 mod 24. So initializing the counter to zero and adding 6 with the caveat that the counter is defined in modular arithmetic mod 24 you can verify if each hour is the corresponding hour, and if not, insert a new row with the corresponding hour and with np.nan as value.

I don't know how to do the implementation of modular arithmetic in python to iterate a dataframe column.

Thank you very much.

解决方案

Solution

group_hours = (df.hour <= df.hour.shift()).cumsum()

def insert_missing_hours(df):
    return df.set_index('hour').reindex([0, 6, 12, 18]).reset_index()

df.groupby(group_hours).apply(insert_missing_hours).reset_index(drop=1)

Looks like:

    hour  value
0      0    1.0
1      6    2.0
2     12    3.0
3     18    4.0
4      0    5.0
5      6    6.0
6     12    7.0
7     18    8.0
8      0    NaN
9      6    9.0
10    12   10.0
11    18   11.0
12     0    NaN
13     6    NaN
14    12   12.0
15    18   13.0
16     0   14.0
17     6    NaN
18    12    NaN
19    18    NaN

Explanation

In order to apply reindex I needed to determine which rows to group. I checked to see if row's hour was less or equal than prior row's hour. If so, that flags a new group.

insert_missing_hours is precisely the reindex of subgroups with [0, 6, 12, 18].

这篇关于python中的模数运算来迭代一个 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆