pandas :组内最大值和最小值之间的差异 [英] Pandas: Difference between largest and smallest value within group

查看：141 发布时间：2020/5/18 18:46:17 python pandas numpy

本文介绍了 pandas :组内最大值和最小值之间的差异的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给出一个看起来像这样的数据框

Given a data frame that looks like this

GROUP VALUE
  1     5
  2     2
  1     10
  2     20
  1     7

我想计算每个组中最大值和最小值之间的差.也就是说，结果应该是

I would like to compute the difference between the largest and smallest value within each group. That is, the result should be

GROUP   DIFF
  1      5
  2      18

在Pandas中执行此操作的简单方法是什么?

What is an easy way to do this in Pandas?

在Pandas中，对于具有约200万行和100万组的数据帧，执行此操作的快速方法是什么?

What is a fast way to do this in Pandas for a data frame with about 2 million rows and 1 million groups?

推荐答案

使用@unutbu的df

Using @unutbu 's df

每个时间
unutbu的解决方案最适合大型数据集

per timing
unutbu's solution is best over large data sets

import pandas as pd
import numpy as np

df = pd.DataFrame({'GROUP': [1, 2, 1, 2, 1], 'VALUE': [5, 2, 10, 20, 7]})

df.groupby('GROUP')['VALUE'].agg(np.ptp)

GROUP
1     5
2    18
Name: VALUE, dtype: int64

np.ptp文档返回数组的范围

np.ptp docs returns the range of an array

定时
小df

timing
small df

大df
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 100, VALUE=np.random.rand(1000000)))

large df
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 100, VALUE=np.random.rand(1000000)))

大df
许多组
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 10000, VALUE=np.random.rand(1000000)))

large df
many groups
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 10000, VALUE=np.random.rand(1000000)))

这篇关于 pandas :组内最大值和最小值之间的差异的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas :组内最大值和最小值之间的差异 [英] Pandas: Difference between largest and smallest value within group

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas :组内最大值和最小值之间的差异 [英] Pandas: Difference between largest and smallest value within group

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭