pandas - 按连续范围分组 [英] Pandas - group by consecutive ranges

查看:110
本文介绍了 pandas - 按连续范围分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有以下结构的数据框 - 开始,结束和高度。



数据框的一些属性:


  • 数据框中的一行总是从前一行结束的位置开始,即如果行n的结尾是100,那么行n + 1的起始位置是101.

  • 第n + 1行的高度总是与第n + 1行的高度不同(这是数据在不同行中的原因)。


我想以一种方式对数据帧进行分组,即将高度分组为5桶长的桶,即桶分别为0,1-5,6-10,11-15和> 15 。



请参阅下面的代码示例,其中我正在寻找的是 group_by_bucket 函数的实现。



我试着寻找其他问题,但无法得到我正在寻找的确切答案。



预先感谢!

 >>> d = pd.DataFrame([[1,3,5],[4,10,7],[11,17,6],[18,26,12],[27,30,15],[31, 40,6],[41,42,7]],列= ['开始','结束','高度'])
>>> d
开始结束高度
0 1 3 8
1 4 10 7
2 11 17 6
3 18 26 12
4 27 30 15
5 31 40 6
6 41 42 7
>>> d_gb = group_by_bucket(d)
>>> d_gb
start end height_grouped
0 1 17 6_10
1 18 30 11_15
2 31 42 6_10


解决方案

一种方法:

  df = pd.DataFrame([[1,3,10],[4,10,7],[11,17,6],[18,26,12],
[27,30,15 ],[31,40,6],[41,42,6]],columns = ['start','end','height'])

使用 cut 进行分组:

  df ['groups'] = pd.cut(df.height,[ -  1,0,5,10,15,1000])

寻找断点:

  df ['categories'] =(df.groups!= df.groups.shift())。cumsum()

code> df 是:

 
开始结束高度组类别
0 1 3 10(5,10] 0
1 4 10 7(5,10)0
2 11 17 6(5,10)0
3 18 26 12(10,15)1
4 27 30 15(10,15)1
5 31 40 6(5,10)2
6 41 42 6(5,10] 2

定义有趣的数据:

  f = {'start':['first'],'end':['last'],'groups':['first']} 

并使用 groupby.agg 函数: p>

  df.groupby('categories')。agg(f)

groups end start $ (5,10)17 1
1(10,15)30 18
2(5,10)42 31


I have a dataframe with the following structure - Start, End and Height.

Some properties of the dataframe:

  • A row in the dataframe always starts from where the previous row ended i.e. if the end for row n is 100 then the start of line n+1 is 101.
  • The height of row n+1 is always different then the height in row n+1 (this is the reason the data is in different rows).

I'd like to group the dataframe in a way that heights will be grouped in buckets of 5 longs i.e. the buckets are 0, 1-5, 6-10, 11-15 and >15.

See code example below where what I'm looking for is the implemetation of group_by_bucket function.

I tried looking at other questions but couldn't get exact answer to what I was looking for.

Thanks in advance!

>>> d = pd.DataFrame([[1,3,5], [4,10,7], [11,17,6], [18,26, 12], [27,30, 15], [31,40,6], [41, 42, 7]], columns=['start','end', 'height'])
>>> d
   start  end  height
0      1    3       8
1      4   10       7
2     11   17       6
3     18   26      12
4     27   30      15
5     31   40       6
6     41   42       7
>>> d_gb = group_by_bucket(d)
>>> d_gb
   start  end height_grouped
0      1   17           6_10
1     18   30          11_15
2     31   42           6_10

解决方案

A way to do that :

df = pd.DataFrame([[1,3,10], [4,10,7], [11,17,6], [18,26, 12],
[27,30, 15], [31,40,6], [41, 42, 6]], columns=['start','end', 'height'])

Use cut to make groups :

df['groups']=pd.cut(df.height,[-1,0,5,10,15,1000])

Find break points :

df['categories']=(df.groups!=df.groups.shift()).cumsum()

Then df is :

"""
   start  end  height    groups  categories
0      1    3      10   (5, 10]           0
1      4   10       7   (5, 10]           0
2     11   17       6   (5, 10]           0
3     18   26      12  (10, 15]           1
4     27   30      15  (10, 15]           1
5     31   40       6   (5, 10]           2
6     41   42       6   (5, 10]           2
"""

Define interesting data :

f = {'start':['first'],'end':['last'], 'groups':['first']}

And use the groupby.agg function :

df.groupby('categories').agg(f)
"""
              groups  end start
               first last first
categories                     
0            (5, 10]   17     1
1           (10, 15]   30    18
2            (5, 10]   42    31
"""

这篇关于 pandas - 按连续范围分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆