如何仅聚合混合 dtypes 数据框中的数字列 [英] how to aggregate only the numerical columns in a mixed dtypes dataframe

查看：61 发布时间：2021/6/14 18:32:39 python pandas aggregate aggregate-functions pandas-groupby

本文介绍了如何仅聚合混合 dtypes 数据框中的数字列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个混合的 pd.DataFrame:

import pandas as pd
import numpy as np
df = pd.DataFrame({ 'A' : 1.,
                     'B' : pd.Timestamp('20130102'),
                     'C' : pd.Timestamp('20180101'),
                     'D' : np.random.rand(10),
                     'F' : 'foo' })

df
Out[12]: 
     A          B          C         D    F
0  1.0 2013-01-02 2018-01-01  0.592533  foo
1  1.0 2013-01-02 2018-01-01  0.819248  foo
2  1.0 2013-01-02 2018-01-01  0.298035  foo
3  1.0 2013-01-02 2018-01-01  0.330128  foo
4  1.0 2013-01-02 2018-01-01  0.371705  foo
5  1.0 2013-01-02 2018-01-01  0.541246  foo
6  1.0 2013-01-02 2018-01-01  0.976108  foo
7  1.0 2013-01-02 2018-01-01  0.423069  foo
8  1.0 2013-01-02 2018-01-01  0.863764  foo
9  1.0 2013-01-02 2018-01-01  0.037085  foo

我想聚合我的数字列，但也要保留非数字列.如果我执行 gropuby 后跟 agg.我得到:

I would like to aggregate my numerical columns, but keep also the non-numerical ones. If I do a gropuby followed by agg. I get:

df.groupby('B').agg(np.median)
Out[13]: 
              A         D
B                        
2013-01-02  1.0  0.482157

这很好，我知道这是期望的行为，因为其他 dtypes 可能会在 np.median 期间引发异常，但我也想获得我的原始列 F 值 foo，以及 C 和 2018-01-01

which is fine, and I know is desired behavior as the other dtypes probably raise exceptions during np.median, but I would like to get also my original column F with value foo, as well as C with 2018-01-01

到目前为止，我已经用自定义包装器解决了我的数值聚合函数，例如如果我想对我的数据框执行 nanmean:

So far, I have solved with a custom wrapper to my numerical aggregation functions e.g. if I wanted to do a nanmean over my dataframe:

def my_nan_median(x):
    if isinstance(x.values[0], np.datetime64):
        return np.min(x) # let the first datetime pass! 
    elif isinstance(x.values[0], str):
        return x.values[0] # let the strings pass!
    else:
        return np.nanmedian(x)

但它看起来很糟糕.这样做的正确方法是什么?

but it looks awful. What is the right way to do so?

如何仅聚合混合 dtypes 数据框中的数字列 [英] how to aggregate only the numerical columns in a mixed dtypes dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何仅聚合混合 dtypes 数据框中的数字列 [英] how to aggregate only the numerical columns in a mixed dtypes dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭