如何处理python大 pandas 这个复杂的逻辑? [英] How to deal with this complex logic in python pandas?
本文介绍了如何处理python大 pandas 这个复杂的逻辑?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
Data1,Data2,Flag
2016-04- 29,00:40:15,1
2016-04-29,00:40:24,2
2016-04-29,00:40:35,2
2015-04 -29,00:40:36,2
2015-04-29,00:40:43
2015-04-29,00:40:45,2
2015- 04-29,00:40:55,1
2015-04-29,00:41:05,1
2015-04-29,00:41:16,1
2015 -04-29,00:41:17,2
.....................
......... ............
2016-11-29,11:52:36,2
2016-11-29,11:52:43
2016-11-29,11:52:45,2
2016-11-29,11:52:55,1
我想获取数据符合以下要求。
- 如您所知,第一个数据的时间是
2016-04-29,00:40:15
。我想得到这个数据帧中的下一个数据大于引物的数据18秒。
我会得到第二个数据:2016-04-29,00:40:35,2
第三个数据是:2015-04-29,00:40:55,1
- 如果下一个数据的标志与引用的数据不同,我会得到这个数据,无论是否已经过了18秒。
对于上述两个要求,我将获得以下数据: p>
Data1,Data2,Flag
2016-04-29,00:40:15,1
2016- 04-29,00:40:24,2
2015-04-29,00:40:43
2015-04-29,00:40:55,1
2015 -04-29,00:41:16,1
2015-04-29,00:41:17,2
................. ....
解决方案
参考
计时
因为@Kartik坚持: - )
I have some data like follow structure. It used in python pandas Data Frame and I named it df.
Data1,Data2,Flag
2016-04-29,00:40:15,1
2016-04-29,00:40:24,2
2016-04-29,00:40:35,2
2015-04-29,00:40:36,2
2015-04-29,00:40:43,2
2015-04-29,00:40:45,2
2015-04-29,00:40:55,1
2015-04-29,00:41:05,1
2015-04-29,00:41:16,1
2015-04-29,00:41:17,2
.....................
.....................
2016-11-29,11:52:36,2
2016-11-29,11:52:43,2
2016-11-29,11:52:45,2
2016-11-29,11:52:55,1
I want to get the data meet the following requirements.
- As you know the first data's timeseries is
2016-04-29,00:40:15
. I want to get the next data in this dataframe larger than primer's data 18 secs. I'll get the second data :2016-04-29,00:40:35,2
The third data is:2015-04-29,00:40:55,1
- If the next data's flag is different from the primer's data.I will get this data regardless of whether it has passed 18 secs.
For the above two requirements, I 'll get the data as following:
Data1,Data2,Flag
2016-04-29,00:40:15,1
2016-04-29,00:40:24,2
2015-04-29,00:40:43,2
2015-04-29,00:40:55,1
2015-04-29,00:41:16,1
2015-04-29,00:41:17,2
.....................
解决方案
refer to stackoverflow documentation
I built a generator to produce the rows then used pd.concat
def get_row(df):
ref = None
for i, row in df.iterrows():
if ref is not None:
cond1 = (row.Data2.total_seconds() -
ref.Data2.total_seconds() > 18)
cond2 = row.Flag != ref.Flag
if ref is None or cond1 or cond2:
yield row
ref = row
pd.concat([r for r in get_row(df)], axis=1).T
Timing
Because @Kartik insisted :-)
这篇关于如何处理python大 pandas 这个复杂的逻辑?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文