pandas :分组依据和数据透视表的区别 [英] Pandas: group by and Pivot table difference
问题描述
我刚刚开始学习Pandas,想知道pandas groupby
和pandas pivot_table
函数之间是否有任何区别.谁能帮助我了解他们之间的区别.
帮助将不胜感激.
I just started learning Pandas and was wondering if there is any difference between pandas groupby
and pandas pivot_table
functions. Can anyone help me understand the difference between them.
Help would be appreciated.
推荐答案
pivot_table
和groupby
都用于聚合数据框.区别仅在于结果的形状.
Both pivot_table
and groupby
are used to aggregate your dataframe. The difference is only with regard to the shape of the result.
使用pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)
创建一个表,其中a
在行轴上,b
在列轴上,并且值是c
的总和.
Using pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)
a table is created where a
is on the row axis, b
is on the column axis, and the values are the sum of c
.
示例:
df = pd.DataFrame({"a": [1,2,3,1,2,3], "b":[1,1,1,2,2,2], "c":np.random.rand(6)})
pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)
b 1 2
a
1 0.528470 0.484766
2 0.187277 0.144326
3 0.866832 0.650100
使用groupby
,将给定的维放入列中,并为这些维的每种组合创建行.
Using groupby
, the dimensions given are placed into columns, and rows are created for each combination of those dimensions.
在此示例中,我们创建了一系列值c
的总和,并按a
和b
的所有唯一组合分组.
In this example, we create a series of the sum of values c
, grouped by all unique combinations of a
and b
.
df.groupby(['a','b'])['c'].sum()
a b
1 1 0.528470
2 0.484766
2 1 0.187277
2 0.144326
3 1 0.866832
2 0.650100
Name: c, dtype: float64
如果我们省略['c']
,则与groupby
相似.在这种情况下,它将创建一个数据帧(不是一系列数据),该数据帧是按a
和b
的唯一值分组的所有剩余列的总和.
A similar usage of groupby
is if we omit the ['c']
. In this case, it creates a dataframe (not a series) of the sums of all remaining columns grouped by unique values of a
and b
.
print df.groupby(["a","b"]).sum()
c
a b
1 1 0.528470
2 0.484766
2 1 0.187277
2 0.144326
3 1 0.866832
2 0.650100
这篇关于 pandas :分组依据和数据透视表的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!