pandas - 按连续范围分组 [英] Pandas - group by consecutive ranges

查看：110 发布时间：2018/5/30 13:58:14 python pandas group-by intervals

本文介绍了 pandas - 按连续范围分组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个具有以下结构的数据框 - 开始，结束和高度。

数据框的一些属性：

数据框中的一行总是从前一行结束的位置开始，即如果行n的结尾是100，那么行n + 1的起始位置是101.

第n + 1行的高度总是与第n + 1行的高度不同（这是数据在不同行中的原因）。

我想以一种方式对数据帧进行分组，即将高度分组为5桶长的桶，即桶分别为0,1-5,6-10,11-15和> 15 。

请参阅下面的代码示例，其中我正在寻找的是 group_by_bucket 函数的实现。

我试着寻找其他问题，但无法得到我正在寻找的确切答案。

预先感谢！
>>> d = pd.DataFrame（[[1,3,5]，[4,10,7]，[11,17,6]，[18,26,12]，[27,30,15]，[31， 40,6]，[41,42,7]]，列= ['开始'，'结束'，'高度']） >>> d 开始结束高度 0 1 3 8 1 4 10 7 2 11 17 6 3 18 26 12 4 27 30 15 5 31 40 6 6 41 42 7 >>> d_gb = group_by_bucket（d） >>> d_gb start end height_grouped 0 1 17 6_10 1 18 30 11_15 2 31 42 6_10

解决方案
一种方法：

df = pd.DataFrame（[[1,3,10]，[4,10,7]，[11,17,6]，[18,26,12]， [27,30,15 ]，[31,40,6]，[41,42,6]]，columns = ['start'，'end'，'height']）
使用 cut 进行分组：

df ['groups'] = pd.cut（df.height，[ - 1,0,5,10,15,1000]）
寻找断点：

df ['categories'] =（df.groups！= df.groups.shift（））。cumsum（）
code> df 是：

开始结束高度组类别 0 1 3 10（5，10] 0 1 4 10 7（5，10）0 2 11 17 6（5，10）0 3 18 26 12（10,15）1 4 27 30 15（10,15）1 5 31 40 6（5，10）2 6 41 42 6（5，10] 2
定义有趣的数据：

f = {'start'：['first']，'end'：['last']，'groups'：['first']}
并使用 groupby.agg 函数： p>

df.groupby（'categories'）。agg（f） groups end start $ （5,10）17 1 1（10,15）30 18 2（5,10）42 31

I have a dataframe with the following structure - Start, End and Height.

Some properties of the dataframe:

A row in the dataframe always starts from where the previous row ended i.e. if the end for row n is 100 then the start of line n+1 is 101.

The height of row n+1 is always different then the height in row n+1 (this is the reason the data is in different rows).

I'd like to group the dataframe in a way that heights will be grouped in buckets of 5 longs i.e. the buckets are 0, 1-5, 6-10, 11-15 and >15.

See code example below where what I'm looking for is the implemetation of group_by_bucket function.

I tried looking at other questions but couldn't get exact answer to what I was looking for.

Thanks in advance!
>>> d = pd.DataFrame([[1,3,5], [4,10,7], [11,17,6], [18,26, 12], [27,30, 15], [31,40,6], [41, 42, 7]], columns=['start','end', 'height']) >>> d start end height 0 1 3 8 1 4 10 7 2 11 17 6 3 18 26 12 4 27 30 15 5 31 40 6 6 41 42 7 >>> d_gb = group_by_bucket(d) >>> d_gb start end height_grouped 0 1 17 6_10 1 18 30 11_15 2 31 42 6_10

解决方案
A way to do that :
df = pd.DataFrame([[1,3,10], [4,10,7], [11,17,6], [18,26, 12], [27,30, 15], [31,40,6], [41, 42, 6]], columns=['start','end', 'height'])
Use cut to make groups :
df['groups']=pd.cut(df.height,[-1,0,5,10,15,1000])
Find break points :
df['categories']=(df.groups!=df.groups.shift()).cumsum()
Then df is :
""" start end height groups categories 0 1 3 10 (5, 10] 0 1 4 10 7 (5, 10] 0 2 11 17 6 (5, 10] 0 3 18 26 12 (10, 15] 1 4 27 30 15 (10, 15] 1 5 31 40 6 (5, 10] 2 6 41 42 6 (5, 10] 2 """
Define interesting data :
f = {'start':['first'],'end':['last'], 'groups':['first']}
And use the groupby.agg function :
df.groupby('categories').agg(f) """ groups end start first last first categories 0 (5, 10] 17 1 1 (10, 15] 30 18 2 (5, 10] 42 31 """

这篇关于 pandas - 按连续范围分组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas - 按连续范围分组 [英] Pandas - group by consecutive ranges

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas - 按连续范围分组 [英] Pandas - group by consecutive ranges

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭