遍历 pandas 数据框 [英] Iterating through a pandas dataframe
问题描述
我有一个pandas数据框,其中一列表示在其下一行中另一列中的位置值是否发生了变化.例如,
I have a pandas dataframe where one column represents if the location value in another column changed in the row below it. As an example,
2013-02-05 19:45:00 (39.94, -86.159) True
2013-02-05 19:50:00 (39.94, -86.159) True
2013-02-05 19:55:00 (39.94, -86.159) False
2013-02-05 20:00:00 (39.777, -85.995) False
2013-02-05 20:05:00 (39.775, -85.978) True
2013-02-05 20:10:00 (39.775, -85.978) True
2013-02-05 20:15:00 (39.775, -85.978) False
2013-02-05 20:20:00 (39.94, -86.159) True
2013-02-05 20:30:00 (39.94, -86.159) False
因此,我要做的是逐行浏览此数据帧,并使用False
检查行.然后(可以添加另一列),该时间总共花费了该位置连续"的时间.可以像上面的示例一样再次访问相同的地方.在这种情况下,将其视为单独的条件.因此,对于上面的示例,类似:
So, what I want to do is go row by row through this dataframe and check for the rows with False
. And then (may be add another column) which has total 'continuous' time spent in that place. The same place can be visited again like in the example above. In that case it is taken to be as a separate condition. So, for the above example, something like:
2013-02-05 19:45:00 (39.94, -86.159) True 0
2013-02-05 19:50:00 (39.94, -86.159) True 0
2013-02-05 19:55:00 (39.94, -86.159) False 15
2013-02-05 20:00:00 (39.777, -85.995) False 5
2013-02-05 20:05:00 (39.775, -85.978) True 0
2013-02-05 20:10:00 (39.775, -85.978) True 0
2013-02-05 20:15:00 (39.775, -85.978) False 15
2013-02-05 20:20:00 (39.94, -86.159) True 0
2013-02-05 20:25:00 (39.94, -86.159) False 10
然后我将使用hist()函数每天绘制这些连续"时间的直方图.我如何通过遍历数据帧从第一个数据帧中获取第二个数据帧?我是python和pandas的新手,真正的数据文件很大,因此,我需要相当有效的工具.
I would then plot a histogram of these 'continuous' time spent using the hist() function per day. How would I get the second dataframe from the first by iterating through the dataframe? I'm new to python and pandas and the real datafile is huge so, I would need something reasonably efficient.
推荐答案
这是另一种方法
df['group'] = (df.condition == False).astype('int').cumsum().shift(1).fillna(0)
df
date long lat condition group
2/5/2013 19:45:00 39.940 -86.159 True 0
2/5/2013 19:50:00 39.940 -86.159 True 0
2/5/2013 19:55:00 39.940 -86.159 False 0
2/5/2013 20:00:00 39.777 -85.995 False 1
2/5/2013 20:05:00 39.775 -85.978 True 2
2/5/2013 20:10:00 39.775 -85.978 True 2
2/5/2013 20:15:00 39.775 -85.978 False 2
2/5/2013 20:20:00 39.940 -86.159 True 3
2/5/2013 20:25:00 39.940 -86.159 False 3
df['result'] = df.groupby(['group']).date.transform(lambda sdf: 5 *len(sdf))
df
date long lat condition group result
2/5/2013 19:45:00 39.940 -86.159 True 0 15
2/5/2013 19:50:00 39.940 -86.159 True 0 15
2/5/2013 19:55:00 39.940 -86.159 False 0 15
2/5/2013 20:00:00 39.777 -85.995 False 1 5
2/5/2013 20:05:00 39.775 -85.978 True 2 15
2/5/2013 20:10:00 39.775 -85.978 True 2 15
2/5/2013 20:15:00 39.775 -85.978 False 2 15
2/5/2013 20:20:00 39.940 -86.159 True 3 10
2/5/2013 20:25:00 39.940 -86.159 False 3 10
这篇关于遍历 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!