如何分组多个列并聚合不同列上的差异? [英] How to groupby multiple columns and aggregate diff on different columns?

查看：51 发布时间：2021/6/14 18:34:20 python pandas pandas-groupby

本文介绍了如何分组多个列并聚合不同列上的差异?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在这里寻求有关如何在 Python/Panda 中执行此操作的帮助:

I am looking for help here on how to do this in Python / Panda:

我正在寻找原始数据(如下)，并通过具有多个 cols(州、县和日期)的组找到多个 cols(cnt_a 和 cnt_b)的每日差异.

I am looking to take the original data (below) and find the daily difference of multiple cols (cnt_a and cnt_b) by a group with multiple cols (state, county and date).

我一直在尝试不同的方法，但似乎无法通过检查重复项"来解决问题.问题

I've been trying it different ways, and I can't seem to get by the "check for duplicate" issue

df.cnt_a = df.sort_values(['state','county','date']).groupby['state','county','date','cnt_a'].diff(-1)

尝试将其拆分以一次解决一件事:

Tried splitting it out to fix one thing at a time:

df1 = df.sort_values(['state','county','date'])

df2 = df1.groupby(['state','county'])['cnt_a'].diff()

原始数据.=>df

date        county  state       cnt_a    cnt_b
2020-06-13  Bergen  New Jersey   308     11
2020-06-14  Bergen  New Jersey   308     11
2020-06-15  Bergen  New Jersey   320     15
2020-06-12  Union   New Jersey   100     3
2020-06-13  Union   New Jersey   130     4
2020-06-14  Union   New Jersey   150     5
2020-06-12  Bronx   New York     200     100
2020-06-13  Bronx   New York     210     200

想要的输出

date        county  state       cnt_a   cnt_b   daydiff_a    daydiff_b
2020-06-13  Bergen  New Jersey   308     11        0            0 
2020-06-14  Bergen  New Jersey   308     11        0            0
2020-06-15  Bergen  New Jersey   320     15        12           4
2020-06-12  Union   New Jersey   100     3         0            0
2020-06-13  Union   New Jersey   130     4         30           1
2020-06-14  Union   New Jersey   150     5         20           1
2020-06-12  Bronx   New York     200     100       0            0 
2020-06-13  Bronx   New York     210     200       10           100

推荐答案

对df 进行排序很重要，因为df.groupby 将被排序.如果 df 没有先排序，.groupby 中的连接列将与 df 的顺序不同.

一定要df，按'state'、'country'和'date'的顺序code>，然而，.groupby 中的 'date' 列被忽略.

It's important to sort df, because df.groupby will be sorted. If df isn't sorted first, the joined columns from .groupby will not be in the same order as df.

Be certain to df, in order, by 'state', 'country', and 'date', however, the 'date' column is ignored in .groupby.

指定rsuffix，或使用.rename 更改列标题.

Specify rsuffix, and or use .rename to change the column headers.

import pandas as pd

# setup the test dataframe
data = {'date': ['2020-06-13', '2020-06-14', '2020-06-15', '2020-06-12', '2020-06-13', '2020-06-14', '2020-06-12', '2020-06-13'],
        'county': ['Bergen', 'Bergen', 'Bergen', 'Union', 'Union', 'Union', 'Bronx', 'Bronx'],
        'state': ['New Jersey', 'New Jersey', 'New Jersey', 'New Jersey', 'New Jersey', 'New Jersey', 'New York', 'New York'],
        'cnt_a': [308, 308, 320, 100, 130, 150, 200, 210],
        'cnt_b': [11, 11, 15, 3, 4, 5, 100, 200]}

df = pd.DataFrame(data)

# set the date column to a datetime format
df.date = pd.to_datetime(df.date)

# sort the values
df = df.sort_values(['state', 'county', 'date'])

# groupby and join back to dataframe df
df = df.join(df.groupby(['state', 'county'])[['cnt_a', 'cnt_b']].diff().fillna(0), rsuffix='_diff')

# display(df)
        date  county       state  cnt_a  cnt_b  cnt_a_diff  cnt_b_diff
0 2020-06-13  Bergen  New Jersey    308     11         0.0         0.0
1 2020-06-14  Bergen  New Jersey    308     11         0.0         0.0
2 2020-06-15  Bergen  New Jersey    320     15        12.0         4.0
3 2020-06-12   Union  New Jersey    100      3         0.0         0.0
4 2020-06-13   Union  New Jersey    130      4        30.0         1.0
5 2020-06-14   Union  New Jersey    150      5        20.0         1.0
6 2020-06-12   Bronx    New York    200    100         0.0         0.0
7 2020-06-13   Bronx    New York    210    200        10.0       100.0

这篇关于如何分组多个列并聚合不同列上的差异?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何分组多个列并聚合不同列上的差异? [英] How to groupby multiple columns and aggregate diff on different columns?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何分组多个列并聚合不同列上的差异? [英] How to groupby multiple columns and aggregate diff on different columns?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭