Pandas groupby:在基于Pandas groupby组中另一列中的数据选择行之后,如何选择相邻列的数据? [英] Pandas groupby: how to select adjacent column data after selecting a row based on data in another column in pandas groupby groups?

查看:68
本文介绍了Pandas groupby:在基于Pandas groupby组中另一列中的数据选择行之后,如何选择相邻列的数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据库,如下所示.对于每个日期,都有持续时间条目(每个日期1-20),其中列出了每个持续时间的项目(100s).每个项目在相邻列中都有几个关联的数据点,包括一个标识符.对于每个日期,我想选择最长的持续时间.然后,我想找到一个值最接近给定输入值的项目.然后,我想获取该项目的ID,以便能够在数据库中跟踪该项目的值.

I have a database as partially shown below. For each date, there are entries for duration (1-20 per date), with items (100s) listed for each duration. Each item has several associated data points in adjacent columns, including an identifier. For each date, I want to select the largest duration. Then, I want to find the item with a value closest to a given input value. I would like to then obtain the ID for that item to be able to follow the value of this item through its time in the database.

Index Date      Duration Item   Value  ID
0     1/1/2018     30     100      4    a
1     1/1/2018     30     200      8    b
2     1/1/2018     30     300     20    c
3     1/1/2018     60     100      9    d
4     1/1/2018     60     200     19    e
5     1/1/2018     60     300     33    f
6     1/1/2018     60     400     50    g
7     1/2/2018     31     100      3    a
8     1/2/2018     31     200      7    b
9     1/2/2018     31     300     20    c
10    1/2/2018     61     100      8    d
11    1/2/2018     61     200     17    e
12    1/2/2018     61     300     30    f

我认为pandas groupby函数对于创建日期/持续时间组非常理想:

I thought the pandas groupby function would be ideal for creating the date/duration groups:

df = df.groupby('Date')['Duration'].max()   #creates the correct groups of max duration for each date

没有groupby,可以通过找到正确的行来获取数据,例如:

Without groupby, the data can be obtained by finding the correct row, for instance:

row = df['ID'].index(df['Value'] - target_value).abs().argsort()[:1]]
id = df.loc[row, 'ID']

但是在分组群组中不起作用.我试图通过其他熊猫操作来解决此问题,但在选择具有正确值的项目后无法弄清楚如何获取ID数据.关于SO的问题很多,关于在pandas.groupby之后提取特定列中的数据(或将函数应用于特定列中的数据),但是在选择相邻列中的数据时我没有发现任何问题.如果您能指出正确的方向,我将不胜感激.

But that doesn't work in groupby groups. I've tried to solve this via other pandas operations, but cannot figure out how to obtain the ID data after selecting the item with the correct Value. There are many questions on SO regarding extracting data in specific columns (or applying functions to data in specific columns) after pandas.groupby, but I didn't find anything on selecting data in adjacent columns. I would appreciate it if you can point me in the right direction.

推荐答案

您可以执行以下操作:

target_value = 15
df['max_duration'] = df.groupby('Date')['Duration'].transform('max')
df.query('max_duration == Duration')\
  .assign(dist=lambda df: np.abs(df['Value'] - target_value))\
  .assign(min_dist=lambda df: df.groupby('Date')['dist'].transform('min'))\
  .query('min_dist == dist')\
  .loc[:, ['Date', 'ID']

结果:

        Date ID
4   1/1/2018  e
11  1/2/2018  e

这篇关于Pandas groupby:在基于Pandas groupby组中另一列中的数据选择行之后,如何选择相邻列的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆