pandas 在执行groupby后重置索引并保留选择的列 [英] pandas reset index after performing groupby and retain selective columns

查看:276
本文介绍了 pandas 在执行groupby后重置索引并保留选择的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获取一个pandas数据框,按列对唯一元素进行计数,并保留其中2列.但是我在groupby之后得到了一个多索引数据帧,我无法(1)展平(2)仅选择相关列.这是我的代码:

I want to take a pandas dataframe, do a count of unique elements by a column and retain 2 of the columns. But I get a multi-index dataframe after groupby which I am unable to (1) flatten (2) select only relevant columns. Here is my code:

import pandas as pd
df = pd.DataFrame({
'ID':[1,2,3,4,5,1],
'Ticker':['AA','BB','CC','DD','CC','BB'],
'Amount':[10,20,30,40,50,60],
'Date_1':['1/12/2018','1/14/2018','1/12/2018','1/14/2018','2/1/2018','1/12/2018'],
'Random_data':['ax','','nan','','by','cz'],
'Count':[23,1,4,56,34,53]
})

df2 = df.groupby(['Ticker']).agg(['nunique'])

df2.reset_index()

print(df2)

df2仍然具有两个级别的索引.并具有所有列:Amount,Count,Date_1,ID,Random_data.

df2 still comes out with two levels of index. And has all the columns: Amount, Count, Date_1, ID, Random_data.

如何将其降低到一个索引级别?

How do I reduce it to one level of index?

仅保留ID和Random_data列吗?

And retain only ID and Random_data columns?

推荐答案

尝试以下方法:

1)仅选择相关列(['ID', 'Random_data'])

1) Select only the relevant columns (['ID', 'Random_data'])

2)不要将列表传递给.agg-只是'nunique'-该列表是导致多索引行为的原因.

2) Don't pass a list to .agg - just 'nunique' - the list is what is causing the multi index behaviour.

df2 = df.groupby(['Ticker'])['ID', 'Random_data'].agg('nunique')
df2.reset_index()

  Ticker  ID  Random_data
0     AA   1            1
1     BB   2            2
2     CC   2            2
3     DD   1            1

这篇关于 pandas 在执行groupby后重置索引并保留选择的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆