pandas 聚合数据框仅返回一列 [英] pandas aggregate dataframe returns only one column

查看:109
本文介绍了 pandas 聚合数据框仅返回一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的pandas DataFrame(df):

I have a pandas DataFrame (df) like this:

     foo  id1  bar  id2
0    8.0   1  NULL   1
1    5.0   1  NULL   1
2    3.0   1  NULL   1
3    4.0   1     1   2
4    7.0   1     3   2
5    9.0   1     4   3
6    5.0   1     2   3
7    7.0   1     3   1
...

我想按id1和id2分组,并尝试获取foo和bar的均值.

I want to group by id1 and id2 and try to get the mean of foo and bar.

我的代码:

res = df.groupby(["id1","id2"])["foo","bar"].mean()

我得到的几乎是我的期望:

What I get is almost what I expect:

            foo
id1 id2          
1  1   5.750000
   2   7.000000
2  1   3.500000
   2   1.500000
3  1   6.000000
   2   5.333333

"foo"列中的值正是我要查找的平均值(均值),但是"bar"列在哪里?

The values in column "foo" are exactly the average values (means) that I am looking for but where is my column "bar"?

因此,如果使用的是SQL,我正在寻找类似以下内容的结果: "通过id1,id2从数据帧组中选择avg(foo),avg(bar); " (对此很抱歉,但我更是一个SQL人士,对Pandas还是陌生的,但我现在需要它.)

So if it would be SQL I was looking for a result like from: "select avg(foo), avg(bar) from dataframe group by id1, id2;" (Sorry for this but I am more an sql person and new to pandas but I need it now.)

我也尝试过的方法:

groupedFrame = res.groupby(["id1","id2"])
aggrFrame = groupedFrame.aggregate(numpy.mean)

这给了我完全相同的结果,但仍然缺少列"bar".

Which gives me exactly the same result, still missing column "bar".

我读过的网站:

  • http://wesmckinney.com/blog/groupby-fu-improvements-in-grouping-and-aggregating-data-in-pandas/
  • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.aggregate.html
  • and documentation for group-by but I cannot post the link here.

我做错了什么? -谢谢.

What am I doing wrong? - Thanks in foreward.

推荐答案

您的列bar不是数字,因此聚合函数会忽略它.

There is problem your column bar is not numeric, so aggregate function omit it.

您可以检查 dtype -不是数字:

You can check dtype of omited column - is not numeric:

print (df['bar'].dtype)
object

您可以检查自动排除讨厌的列.

解决方案是先将string值转换为numeric,然后将NaN

Solution is before aggregating convert string values to numeric and if not possible, add NaNs with to_numeric and parameter errors='coerce':

df['bar'] = pd.to_numeric(df['bar'], errors='coerce')
res = df.groupby(["id1","id2"])["foo","bar"].mean()
print (res)
          foo  bar
id1 id2           
1   1    5.75  3.0
    2    5.50  2.0
    3    7.00  3.0

但是,如果数据混合-使用strings的数字是可能的,请使用 replace :

But if have mixed data - numeric with strings is possible use replace:

df['bar'] = df['bar'].replace("NULL", np.nan)

这篇关于 pandas 聚合数据框仅返回一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆