每组每列的唯一值数量 [英] Number of unique values per column by group

查看:81
本文介绍了每组每列的唯一值数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下数据框:

      A      B  E
0   bar    one  1
1   bar  three  1
2  flux    six  1
3  flux  three  2
4   foo   five  2
5   foo    one  1
6   foo    two  1
7   foo    two  2

我想为A的每个值找到其他列中唯一值的数量.

I would like to find, for each value of A, the number of unique values in the other columns.

  1. 我认为以下可以做到:

  1. I thought the following would do it:

df.groupby('A').apply(lambda x: x.nunique())

但是我得到一个错误:

AttributeError: 'DataFrame' object has no attribute 'nunique'

  • 我也尝试过:

  • I also tried with:

    df.groupby('A').nunique()
    

    但是我也得到了错误:

    AttributeError: 'DataFrameGroupBy' object has no attribute 'nunique'
    

  • 最后我尝试了:

  • Finally I tried with:

    df.groupby('A').apply(lambda x: x.apply(lambda y: y.nunique()))
    

    返回:

          A  B  E
    A            
    bar   1  2  1
    flux  1  2  2
    foo   1  3  2
    

    ,似乎是正确的.但是奇怪的是,它也在结果中返回列A.为什么?

    and seems to be correct. Strangely though, it also returns the column A in the result. Why?

    推荐答案

    DataFrame对象没有nunique,只有Series有.您必须选择要在nunique()上应用的列.您可以使用简单的点运算符来做到这一点:

    The DataFrame object doesn't have nunique, only Series do. You have to pick out which column you want to apply nunique() on. You can do this with a simple dot operator:

    df.groupby('A').apply(lambda x: x.B.nunique())
    

    将打印:

    A
    bar     2
    flux    2
    foo     3
    

    并且正在做

    df.groupby('A').apply(lambda x: x.E.nunique())
    

    将打印:

    A
    bar     1
    flux    2
    foo     2
    

    或者,您可以使用以下方法通过一个函数调用来完成此操作:

    Alternatively you can do this with one function call using:

    df.groupby('A').aggregate({'B': lambda x: x.nunique(), 'E': lambda x: x.nunique()})
    

    将打印:

          B  E
    A
    bar   2  1
    flux  2  2
    foo   3  2
    

    要回答有关为什么递归lambda还要打印A列的问题,这是因为当您执行groupby/apply操作时,现在要遍历三个DataFrame对象.每个DataFrame对象都是原始对象的子DataFrame.对它应用操作将把它应用于每个Series.您将nunique()运算符应用于的每个DataFrame有三个Series.

    To answer your question about why your recursive lambda prints the A column as well, it's because when you do a groupby/apply operation, you're now iterating through three DataFrame objects. Each DataFrame object is a sub-DataFrame of the original. Applying an operation to that will apply it to each Series. There are three Series per DataFrame you're applying the nunique() operator to.

    在每个DataFrame上被评估的第一个SeriesA Series,并且由于您已经在A上进行了groupby,因此您知道在每个DataFrame中都有A Series中只有一个唯一值.这就解释了为什么最终会为您提供带有所有1A结果列.

    The first Series being evaluated on each DataFrame is the A Series, and since you've done a groupby on A, you know that in each DataFrame, there is only one unique value in the A Series. This explains why you're ultimately given an A result column with all 1's.

    这篇关于每组每列的唯一值数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆