pandas :组内最大值和最小值之间的差异 [英] Pandas: Difference between largest and smallest value within group
问题描述
给出一个看起来像这样的数据框
Given a data frame that looks like this
GROUP VALUE
1 5
2 2
1 10
2 20
1 7
我想计算每个组中最大值和最小值之间的差.也就是说,结果应该是
I would like to compute the difference between the largest and smallest value within each group. That is, the result should be
GROUP DIFF
1 5
2 18
在Pandas中执行此操作的简单方法是什么?
What is an easy way to do this in Pandas?
在Pandas中,对于具有约200万行和100万组的数据帧,执行此操作的快速方法是什么?
What is a fast way to do this in Pandas for a data frame with about 2 million rows and 1 million groups?
推荐答案
使用@unutbu的df
Using @unutbu 's df
每个时间
unutbu的解决方案最适合大型数据集
per timing
unutbu's solution is best over large data sets
import pandas as pd
import numpy as np
df = pd.DataFrame({'GROUP': [1, 2, 1, 2, 1], 'VALUE': [5, 2, 10, 20, 7]})
df.groupby('GROUP')['VALUE'].agg(np.ptp)
GROUP
1 5
2 18
Name: VALUE, dtype: int64
np.ptp
文档返回数组的范围
np.ptp
docs returns the range of an array
定时
小df
timing
small df
大df
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 100, VALUE=np.random.rand(1000000)))
large df
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 100, VALUE=np.random.rand(1000000)))
大df
许多组
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 10000, VALUE=np.random.rand(1000000)))
large df
many groups
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 10000, VALUE=np.random.rand(1000000)))
这篇关于 pandas :组内最大值和最小值之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!