如何让 Pandas 的 groupby 命令返回 DataFrame 而不是 Series? [英] How can I get pandas' groupby command to return a DataFrame instead of a Series?
问题描述
我不明白pandas 的groupby 的输出.我从一个包含 5 个字段/列(邮编、城市、位置、人口、州)的 DataFrame (df0
) 开始.
I don't understand the output of pandas' groupby. I started with a DataFrame (df0
) with 5 fields/columns (zip, city, location, population, state).
>>> df0.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29467 entries, 0 to 29466
Data columns (total 5 columns):
zip 29467 non-null object
city 29467 non-null object
loc 29467 non-null object
pop 29467 non-null int64
state 29467 non-null object
dtypes: int64(1), object(4)
memory usage: 1.1+ MB
我想得到每个城市的总人口,但由于几个城市有多个邮政编码,我想我会使用 groupby.sum 如下:
I wanted to get the total population of each city, but since several cities have multiple zip codes, I thought I would use groupby.sum as follows:
df6 = df0.groupby(['city','state'])['pop'].sum()
然而,这返回了一个系列而不是一个数据帧:
However, this returned a Series instead of a DataFrame:
>>> df6.info()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 2672, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'info'
>>> type(df6)
<class 'pandas.core.series.Series'>
我希望能够使用类似于
df0[df0['city'].isin(['ALBANY'])]
但是由于我有一个系列而不是数据帧,所以我不能.我也无法强制转换为 DataFrame.
but since I have a Series instead of a DataFrame, I can't. I haven't been able to force a conversion into a DataFrame either.
我现在想知道的是:
- 为什么我没有得到 DataFrame 而不是 Series?
- 我怎样才能得到一个可以让我查询一个城市人口的表格?我可以使用从 groupby 获得的系列,还是应该采取不同的方法?
推荐答案
需要groupby
或 reset_index
用于将 MultiIndex
转换为列:
Need parameter as_index=False
in groupby
or reset_index
for convert MultiIndex
to columns:
df6 = df0.groupby(['city','state'], as_index=False)['pop'].sum()
或者:
df6 = df0.groupby(['city','state'])['pop'].sum().reset_index()
示例:
df0 = pd.DataFrame({'city':['a','a','b'],
'state':['t','t','n'],
'pop':[7,8,9]})
print (df0)
city pop state
0 a 7 t
1 a 8 t
2 b 9 n
df6 = df0.groupby(['city','state'], as_index=False)['pop'].sum()
print (df6)
city state pop
0 a t 15
1 b n 9
<小时>
df6 = df0.groupby(['city','state'])['pop'].sum().reset_index()
print (df6)
city state pop
0 a t 15
1 b n 9
最后选择 loc
,对于标量添加 item()
:
print (df6.loc[df6.state == 't', 'pop'])
0 15
Name: pop, dtype: int64
print (df6.loc[df6.state == 't', 'pop'].item())
15
但如果只需要查找表,则可以使用 Series
和 MultiIndex
:
But if need only lookup table is possible use Series
with MultiIndex
:
s = df0.groupby(['city','state'])['pop'].sum()
print (s)
city state
a t 15
b n 9
Name: pop, dtype: int64
#select all cities by : and state by string like 't'
#output is Series of len 1
print (s.loc[:, 't'])
city
a 15
Name: pop, dtype: int64
#if need output as scalar add item()
print (s.loc[:, 't'].item())
15
这篇关于如何让 Pandas 的 groupby 命令返回 DataFrame 而不是 Series?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!