Seaborn groupby pandas 系列 [英] Seaborn groupby pandas Series
问题描述
我想将我的数据可视化为箱线图,这些箱线图由我可怕的绘图中显示的另一个变量分组:
I want to visualize my data into box plots that are grouped by another variable shown here in my terrible drawing:
所以我所做的是使用一个pandas系列变量来告诉pandas我已经对变量进行了分组,所以这就是我所做的:
So what I do is to use a pandas series variable to tell pandas that I have grouped variables so this is what I do:
import pandas as pd
import seaborn as sns
#example data for reproduciblity
a = pd.DataFrame(
[
[2, 1],
[4, 2],
[5, 1],
[10, 2],
[9, 2],
[3, 1]
])
#converting second column to Series
a.ix[:,1] = pd.Series(a.ix[:,1])
#Plotting by seaborn
sns.boxplot(a, groupby=a.ix[:,1])
这就是我得到的:
然而,我希望得到的是有两个箱线图,每个箱线图只描述第一列,按第二列中的相应列(转换为系列的列)分组,而上面的图分别显示每一列,其中不是我想要的.
However, what I would have expected to get was to have two boxplots each describing only the first column, grouped by their corresponding column in the second column (the column converted to Series), while the above plot shows each column separately which is not what I want.
推荐答案
Dataframe
中的列已经是 Series
,因此无需进行转换.此外,如果您只想将第一列用于两个箱线图,您应该只将其传递给 Seaborn.
A column in a Dataframe
is already a Series
, so your conversion is not necessary. Furthermore, if you only want to use the first column for both boxplots, you should only pass that to Seaborn.
所以:
#example data for reproduciblity
df = pd.DataFrame(
[
[2, 1],
[4, 2],
[5, 1],
[10, 2],
[9, 2],
[3, 1]
], columns=['a', 'b'])
#Plotting by seaborn
sns.boxplot(df.a, groupby=df.b)
我稍微改变了你的例子,给列一个标签在我看来更清楚一点.
I changed your example a little bit, giving columns a label makes it a bit more clear in my opinion.
如果您想分别绘制所有列,您(我认为)基本上需要 groupby
列和任何其他列中的值的所有组合.所以如果你 Dataframe
看起来像这样:
If you want to plot all columns separately you (i think) basically want all combinations of the values in your groupby
column and any other column. So if you Dataframe
looks like this:
a b grouper
0 2 5 1
1 4 9 2
2 5 3 1
3 10 6 2
4 9 7 2
5 3 11 1
并且您需要 a
和 b
列的箱线图,同时按 grouper
列分组.您应该展平列并将 groupby 列更改为包含 a1
、a2
、b1
等值.
And you want boxplots for columns a
and b
while grouped by the column grouper
. You should flatten the columns and change the groupby column to contain values like a1
, a2
, b1
etc.
鉴于上面显示的数据框,这是我认为应该工作的粗略方法:
Here is a crude way which i think should work, given the Dataframe shown above:
dfpiv = df.pivot(index=df.index, columns='grouper')
cols_flat = [dfpiv.columns.levels[0][i] + str(dfpiv.columns.levels[1][j]) for i, j in zip(dfpiv.columns.labels[0], dfpiv.columns.labels[1])]
dfpiv.columns = cols_flat
dfpiv = dfpiv.stack(0)
sns.boxplot(dfpiv, groupby=dfpiv.index.get_level_values(1))
也许有更多奇特的方式来重构Dataframe
.尤其是pivoting后的层级扁平化很难看,我不喜欢.
Perhaps there are more fancy ways of restructuring the Dataframe
. Especially the flattening of the hierarchy after pivoting is hard to read, i dont like it.
这篇关于Seaborn groupby pandas 系列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!