pandas 聚合数据框仅返回一列 [英] pandas aggregate dataframe returns only one column
问题描述
我有一个像这样的pandas DataFrame(df):
I have a pandas DataFrame (df) like this:
foo id1 bar id2
0 8.0 1 NULL 1
1 5.0 1 NULL 1
2 3.0 1 NULL 1
3 4.0 1 1 2
4 7.0 1 3 2
5 9.0 1 4 3
6 5.0 1 2 3
7 7.0 1 3 1
...
我想按id1和id2分组,并尝试获取foo和bar的均值.
I want to group by id1 and id2 and try to get the mean of foo and bar.
我的代码:
res = df.groupby(["id1","id2"])["foo","bar"].mean()
我得到的几乎是我的期望:
What I get is almost what I expect:
foo
id1 id2
1 1 5.750000
2 7.000000
2 1 3.500000
2 1.500000
3 1 6.000000
2 5.333333
"foo"列中的值正是我要查找的平均值(均值),但是"bar"列在哪里?
The values in column "foo" are exactly the average values (means) that I am looking for but where is my column "bar"?
因此,如果使用的是SQL,我正在寻找类似以下内容的结果: "通过id1,id2从数据帧组中选择avg(foo),avg(bar); " (对此很抱歉,但我更是一个SQL人士,对Pandas还是陌生的,但我现在需要它.)
So if it would be SQL I was looking for a result like from: "select avg(foo), avg(bar) from dataframe group by id1, id2;" (Sorry for this but I am more an sql person and new to pandas but I need it now.)
我也尝试过的方法:
groupedFrame = res.groupby(["id1","id2"])
aggrFrame = groupedFrame.aggregate(numpy.mean)
这给了我完全相同的结果,但仍然缺少列"bar".
Which gives me exactly the same result, still missing column "bar".
我读过的网站:
- http://wesmckinney.com/blog/groupby-fu-improvements-in-grouping-aggregating-data-in-pandas/
- https://pandas.pydata.org /pandas-docs/stable/genic/pandas.DataFrame.aggregate.html
- 和分组依据的文档,但我无法在此处发布链接.
- http://wesmckinney.com/blog/groupby-fu-improvements-in-grouping-and-aggregating-data-in-pandas/
- https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.aggregate.html
- and documentation for group-by but I cannot post the link here.
我做错了什么? -谢谢.
What am I doing wrong? - Thanks in foreward.
推荐答案
您的列bar
不是数字,因此聚合函数会忽略它.
There is problem your column bar
is not numeric, so aggregate function omit it.
您可以检查 dtype
-不是数字:
You can check dtype
of omited column - is not numeric:
print (df['bar'].dtype)
object
您可以检查自动排除讨厌的列.
解决方案是先将string
值转换为numeric
,然后将NaN
与
Solution is before aggregating convert string
values to numeric
and if not possible, add NaN
s with to_numeric
and parameter errors='coerce'
:
df['bar'] = pd.to_numeric(df['bar'], errors='coerce')
res = df.groupby(["id1","id2"])["foo","bar"].mean()
print (res)
foo bar
id1 id2
1 1 5.75 3.0
2 5.50 2.0
3 7.00 3.0
但是,如果数据混合-使用strings
的数字是可能的,请使用 replace
:
But if have mixed data - numeric with strings
is possible use replace
:
df['bar'] = df['bar'].replace("NULL", np.nan)
这篇关于 pandas 聚合数据框仅返回一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!