如何处理python大 pandas 这个复杂的逻辑? [英] How to deal with this complex logic in python pandas?

查看:294
本文介绍了如何处理python大 pandas 这个复杂的逻辑?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据如下面的结构。它用于python大熊猫数据框架,我命名为df。

  Data1,Data2,Flag 
2016-04- 29,00:40:15,1
2016-04-29,00:40:24,2
2016-04-29,00:40:35,2
2015-04 -29,00:40:36,2
2015-04-29,00:40:43
2015-04-29,00:40:45,2
2015- 04-29,00:40:55,1
2015-04-29,00:41:05,1
2015-04-29,00:41:16,1
2015 -04-29,00:41:17,2
.....................
......... ............
2016-11-29,11:52:36,2
2016-11-29,11:52:43
2016-11-29,11:52:45,2
2016-11-29,11:52:55,1

我想获取数据符合以下要求。


  1. 如您所知,第一个数据的时间是 2016-04-29,00:40:15 。我想得到这个数据帧中的下一个数据大于引物的数据18秒。
    我会得到第二个数据: 2016-04-29,00:40:35,2
    第三个数据是: 2015-04-29,00:40:55,1

  2. 如果下一个数据的标志与引用的数据不同,我会得到这个数据,无论是否已经过了18秒。

对于上述两个要求,我将获得以下数据: p>

  Data1,Data2,Flag 
2016-04-29,00:40:15,1
2016- 04-29,00:40:24,2
2015-04-29,00:40:43
2015-04-29,00:40:55,1
2015 -04-29,00:41:16,1
2015-04-29,00:41:17,2
................. ....


解决方案

参考






计时



因为@Kartik坚持: - )




I have some data like follow structure. It used in python pandas Data Frame and I named it df.

Data1,Data2,Flag
2016-04-29,00:40:15,1
2016-04-29,00:40:24,2
2016-04-29,00:40:35,2
2015-04-29,00:40:36,2
2015-04-29,00:40:43,2
2015-04-29,00:40:45,2
2015-04-29,00:40:55,1
2015-04-29,00:41:05,1
2015-04-29,00:41:16,1
2015-04-29,00:41:17,2
.....................
.....................
2016-11-29,11:52:36,2
2016-11-29,11:52:43,2
2016-11-29,11:52:45,2
2016-11-29,11:52:55,1

I want to get the data meet the following requirements.

  1. As you know the first data's timeseries is 2016-04-29,00:40:15. I want to get the next data in this dataframe larger than primer's data 18 secs. I'll get the second data : 2016-04-29,00:40:35,2 The third data is: 2015-04-29,00:40:55,1
  2. If the next data's flag is different from the primer's data.I will get this data regardless of whether it has passed 18 secs.

For the above two requirements, I 'll get the data as following:

Data1,Data2,Flag
2016-04-29,00:40:15,1
2016-04-29,00:40:24,2
2015-04-29,00:40:43,2
2015-04-29,00:40:55,1
2015-04-29,00:41:16,1
2015-04-29,00:41:17,2
.....................

解决方案

refer to stackoverflow documentation

I built a generator to produce the rows then used pd.concat

def get_row(df):
    ref = None
    for i, row in df.iterrows():
        if ref is not None:
            cond1 = (row.Data2.total_seconds() - 
                     ref.Data2.total_seconds() > 18)
            cond2 = row.Flag != ref.Flag
        if ref is None or cond1 or cond2:
            yield row
            ref = row

pd.concat([r for r in get_row(df)], axis=1).T


Timing

Because @Kartik insisted :-)

这篇关于如何处理python大 pandas 这个复杂的逻辑?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆