参照先前的组依次遍历DataFrame日期组 [英] Iterate over DataFrame Date Groups in order, with reference to previous group

查看：83 发布时间：2021/2/15 21:17:00 python pandas dataframe lambda iteration

本文介绍了参照先前的组依次遍历DataFrame日期组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个MultiIndex(Name，Date)DataFrame df，我需要按Date进行迭代处理，以便分配基于当前日期和上一个Date的组的值.

I have a MultiIndex (Name, Date) DataFrame df that I need to process iteratively by Date in order to assign a value that is based on both the current and previous Date's Group.

AFAIK处理DataFrame组的最佳方法是通过.apply –例如df.groupby('Date').apply(ifunc).

AFAIK the best way to process DataFrame Groups is by .apply – e.g., df.groupby('Date').apply(ifunc).

但是当ifunc需要在ifunc 处理完先前的日期组之后引用前一个日期组中的值时，如何做呢?

But how can I best do this when ifunc needs to reference the values from the previous Date Group after that previous Group has been processed by ifunc?

以下是这样的ifunc的示例，该ifunc可以在具有列['Dollars', 'Weight', 'Return', 'HaveMax']的df上进行操作:

Here is an example of such an ifunc to operate on df with columns ['Dollars', 'Weight', 'Return', 'HaveMax']:

# (This might not be great python; coding improvements welcome!) # Lambda to add "AddDollars" to Names that don't already "HaveMax" "MaxDollars" def ifunc(group, previous): # Arguments are df groups by Date group['HaveMax'] = previous['HaveMax'] # Each Name's Dollars changed from the previous Date avgWeights = group['Weight'].mean() group['Dollars'] = group['Weight'] * previous['Dollars'] * group['Return'] / avgWeights # Now add "AddDollars" to Names that were under group.loc[group['HaveMax'] == False, 'Dollars'] = group[group['HaveMax'] == False]['Dollars'] + AddDollars # Update HaveMax for any Names that reached MaxDollars on this Date group.loc[group['HaveMax'] == False, 'HaveMax'] = group[group['HaveMax'] == False]['Dollars'] >= MaxDollars return group

样本数据:

Sample data:

AddDollars = 1.0 MaxDollars = 10.0 df = pd.DataFrame(data=[('A', '20210101', 9.0, 1.0, 0, False), ('B', '20210101', 5.0, 1.0, 0, False), ('C', '20210101', 5.0, 1.0, 0, True), ('A', '20210102', 0.0, 1.0, 1.0, False), ('B', '20210102', 0.0, 1.0, 1.0, False), ('C', '20210102', 0.0, 1.0, 1.0, False)], columns=('Name', 'Date', 'Dollars', 'Weight', 'Return', 'HaveMax')).set_index(['Name', 'Date'])

所需的输出:

Dollars Weight Return HaveMax Name Date A 20210101 9.0 1.0 0.0 False B 20210101 5.0 1.0 0.0 False C 20210101 5.0 1.0 0.0 True A 20210102 10.0 1.0 1.0 True B 20210102 6.0 1.0 1.0 False C 20210102 5.0 1.0 1.0 True

推荐答案

使用groupby遍历组.

AddDollars = 1.0 MaxDollars = 10.0 df = pd.DataFrame(data=[('A', '20210101', 9.0, 1.0, 0, False), ('B', '20210101', 5.0, 1.0, 0, False), ('C', '20210101', 5.0, 1.0, 0, True), ('A', '20210102', 0.0, 1.0, 1.0, False), ('B', '20210102', 0.0, 1.0, 1.0, False), ('C', '20210102', 0.0, 1.0, 1.0, False)], columns=('Name', 'Date', 'Dollars', 'Weight', 'Return', 'HaveMax')).set_index(['Name', 'Date']) dft = df.groupby(df.index.get_level_values('Date')) groupings = list(dft.groups.keys()) previous = dft.get_group(groupings[0]) for i, gkey in enumerate(groupings[1:], 1): group = dft.get_group(gkey) group['HaveMax'] = previous['HaveMax'].values avgWeights = group['Weight'].mean() group['Dollars'] = group['Weight'].values * previous['Dollars'].values * group['Return'].values / avgWeights group.loc[group['HaveMax'] == False, 'Dollars'] = group[group['HaveMax'] == False]['Dollars'] + AddDollars group.loc[group['HaveMax'] == False, 'HaveMax'] = group[group['HaveMax'] == False]['Dollars'] >= MaxDollars # Assign the calculated values back to the DataFrame: df.loc[group.index] = group # Prepare for next iteration: previous = group

这篇关于参照先前的组依次遍历DataFrame日期组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

参照先前的组依次遍历DataFrame日期组 [英] Iterate over DataFrame Date Groups in order, with reference to previous group

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

参照先前的组依次遍历DataFrame日期组 [英] Iterate over DataFrame Date Groups in order, with reference to previous group

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭