pandas 聚合数据框仅返回一列 [英] pandas aggregate dataframe returns only one column

查看：109 发布时间：2020/5/24 3:54:21 python pandas group-by aggregate

本文介绍了 pandas 聚合数据框仅返回一列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个像这样的pandas DataFrame(df):

I have a pandas DataFrame (df) like this:

     foo  id1  bar  id2
0    8.0   1  NULL   1
1    5.0   1  NULL   1
2    3.0   1  NULL   1
3    4.0   1     1   2
4    7.0   1     3   2
5    9.0   1     4   3
6    5.0   1     2   3
7    7.0   1     3   1
...

我想按id1和id2分组，并尝试获取foo和bar的均值.

I want to group by id1 and id2 and try to get the mean of foo and bar.

我的代码:

res = df.groupby(["id1","id2"])["foo","bar"].mean()

我得到的几乎是我的期望:

What I get is almost what I expect:

            foo
id1 id2          
1  1   5.750000
   2   7.000000
2  1   3.500000
   2   1.500000
3  1   6.000000
   2   5.333333

"foo"列中的值正是我要查找的平均值(均值)，但是"bar"列在哪里?

The values in column "foo" are exactly the average values (means) that I am looking for but where is my column "bar"?

因此，如果使用的是SQL，我正在寻找类似以下内容的结果: "通过id1，id2从数据帧组中选择avg(foo)，avg(bar)； " (对此很抱歉，但我更是一个SQL人士，对Pandas还是陌生的，但我现在需要它.)

So if it would be SQL I was looking for a result like from: "select avg(foo), avg(bar) from dataframe group by id1, id2;" (Sorry for this but I am more an sql person and new to pandas but I need it now.)

我也尝试过的方法:

groupedFrame = res.groupby(["id1","id2"])
aggrFrame = groupedFrame.aggregate(numpy.mean)

这给了我完全相同的结果，但仍然缺少列"bar".

Which gives me exactly the same result, still missing column "bar".

我读过的网站:

http://wesmckinney.com/blog/groupby-fu-improvements-in-grouping-aggregating-data-in-pandas/
https://pandas.pydata.org /pandas-docs/stable/genic/pandas.DataFrame.aggregate.html
和分组依据的文档，但我无法在此处发布链接.

http://wesmckinney.com/blog/groupby-fu-improvements-in-grouping-and-aggregating-data-in-pandas/
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.aggregate.html
and documentation for group-by but I cannot post the link here.

我做错了什么? -谢谢.

What am I doing wrong? - Thanks in foreward.

推荐答案

您的列bar不是数字，因此聚合函数会忽略它.

There is problem your column bar is not numeric, so aggregate function omit it.

您可以检查 dtype -不是数字:

You can check dtype of omited column - is not numeric:

print (df['bar'].dtype)
object

您可以检查自动排除讨厌的列.

解决方案是先将string值转换为numeric，然后将NaN与

Solution is before aggregating convert string values to numeric and if not possible, add NaNs with to_numeric and parameter errors='coerce':

df['bar'] = pd.to_numeric(df['bar'], errors='coerce')
res = df.groupby(["id1","id2"])["foo","bar"].mean()
print (res)
          foo  bar
id1 id2           
1   1    5.75  3.0
    2    5.50  2.0
    3    7.00  3.0

但是，如果数据混合-使用strings的数字是可能的，请使用 replace :

But if have mixed data - numeric with strings is possible use replace:

df['bar'] = df['bar'].replace("NULL", np.nan)

这篇关于 pandas 聚合数据框仅返回一列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 聚合数据框仅返回一列 [英] pandas aggregate dataframe returns only one column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 聚合数据框仅返回一列 [英] pandas aggregate dataframe returns only one column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭