数据透视表按索引聚合 [英] Pivot table aggregation by index

查看:171
本文介绍了数据透视表按索引聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框,其中包含有关学生成绩和考试成绩的信息.第一个看起来像这样:

I have two dataframes containing information about student grades and test scores. The first looks like this:

     ID    Test_Score    Class1    Class2    Class3
0    001   85            B-        A         C+
1    002   78            B         NaN       B+
2    003   93            A         B         NaN
...

第二个看起来像这样:

     0                1
0    Algebra          A
0    Calculus_1       B
0    Calculus_2       C-
1    Algebra          C+
1    Trig             F
1    Trig             C
1    Calculus_1       C-
...

每个数据框中的索引指的是同一位学生.因此,两个数据帧中索引为0的学生都是相同的.

The indices in each dataframe refer to the same student. So the student at index 0 is the same in both dataframes.

我想做的是从第二个数据框中创建一个数据透视表,其中的行对应于学生(即索引),列是数学课,值是他们在每个课中取得的最高分(因为学生有可能不止一次上课).然后,我将其连接到第一个数据帧.

What I want to do is create a pivot table from the second dataframe, where the rows correspond to students (i.e. indices), the columns are the math classes, and the values are the highest grade they made in each class (since it's possible for a student to have taken a class more than once). I will then concatenate this to the first dataframe.

我已经写了一个聚合函数,可以找到最高的分数.

I have already written an aggregation function that finds the highest grade.

我尝试以下操作,因为它会失败:

I tried the following knowing it would fail:

p = pd.pivot_table(u, columns=0, values=1, aggfunc=highest)

我得到的是单行,而不是为每个学生返回一行数据,而是包含给定班级中任何学生的最高成绩:

Instead of returning a dataframe with a row for each student, I get a single row that contains the highest grade that any student made in the given class:

     Algebra   Trig    Precalculus   Calculus_1   Calculus_2
1    A         A+      A+            A            A

在我脑海中解决问题的方式,我只需要汇总共享索引的行,就不能确定该怎么做.

The way I've framed the problem in my mind, I just need to aggregate across the rows which share in index, which I am unsure how to do.

我也愿意接受完全不同的方法.

I'm also open to an entirely different approach.

所需的输出:

     ID    Test_Score    Class1    Class2    Class3   Algebra   Trig ...
0    001   85            B-        A         C+       A         NaN
1    002   78            B         NaN       B+       C+        C
2    003   93            A         B         NaN      B         B-
...

这是我的最高"功能的代码:

Here is the code for my 'highest' function:

def highest(x):
    q = 0
    z = None
    for g in x:
        if qpoints(g) > q:
            q = qpoints(g)
            z = g
    return z

其中qpoints是我已经在其他地方使用的以下函数:

where qpoints is the following function that I was already using elsewhere:

def qpoints(x):
    qvalue = {'W': 0,
              'F': 0,
              'D': 1.0,
              'D+': 1.33,
              'C-': 1.67,
              'C': 2.0,
              'C+': 2.33,
              'B-': 2.67,
              'B': 3.0,
              'B+': 3.33,
              'A-': 3.84,
              'A': 4.0,
              'A+': 4.0}
    return qvalue[x]

推荐答案

您的问题是,您在pivot_table时需要保留第二个数据帧的索引值,请参见

You problem is that you need to keep the index values of the second dataframe when you pivot_table, see this answer for understanding. So if you do:

print (df2.reset_index().pivot_table(index='index', values=[1], columns=[0], 
                                     aggfunc= lambda x: sorted(x)[0])) 
                                     # I used my own idea of highest function
            1                           
0     Algebra Calculus_1 Calculus_2 Trig
index                                   
0           A          B         C-  NaN
1          C+         C-        NaN    C

然后您可以 join ,例如:

then you can join such as:

df_p = df2.reset_index().pivot_table(index='index', values=[1], columns=[0], 
                                                aggfunc= lambda x: sorted(x)[0])
df_p.columns = [col[1] for col in df_p.columns]
new_df = df1.join(df_p)

print (new_df)
    ID  Test_Score Class1 Class2 Class3 Algebra Calculus_1 Calculus_2 Trig
0  001          85     B-      A     C+       A          B         C-  NaN
1  002          78      B    NaN     B+      C+         C-        NaN    C
2  003          93      A      B    NaN     NaN        NaN        NaN  NaN

这篇关于数据透视表按索引聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆