Seaborn:带有频率的countplot() [英] Seaborn: countplot() with frequencies

查看:526
本文介绍了Seaborn:带有频率的countplot()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有名为"AXLES"的列的Pandas DataFrame,它可以采用3到12之间的整数值.我正在尝试使用Seaborn的countplot()选项来实现以下绘图:

I have a Pandas DataFrame with a column called "AXLES", which can take an integer value between 3-12. I am trying to use Seaborn's countplot() option to achieve the following plot:

    左y轴显示数据中出现的这些值的频率.轴的延伸范围是[0%-100%],每10%会打勾.
  1. 右y轴显示实际计数,值对应于由左y轴确定的刻度线(每10%标记一次).
  2. x轴显示条形图的类别[3、4、5、6、7、8、9、10、11、12].
  3. 条形图顶部的注释显示了该类别的实际百分比.
  1. left y axis shows the frequencies of these values occurring in the data. The axis extends are [0%-100%], tick marks at every 10%.
  2. right y axis shows the actual counts, values correspond to tick marks determined by the left y axis (marked at every 10%.)
  3. x axis shows the categories for the bar plots [3, 4, 5, 6, 7, 8, 9, 10, 11, 12].
  4. Annotation on top of the bars show the actual percentage of that category.

以下代码为我提供了下面的图表,其中包含实际计数,但是我找不到将它们转换为频率的方法.我可以使用df.AXLES.value_counts()/len(df.index)来获取频率,但是我不确定如何将这些信息插入Seaborn的countplot().

The following code gives me the plot below, with actual counts, but I could not find a way to convert them into frequencies. I can get the frequencies using df.AXLES.value_counts()/len(df.index) but I am not sure about how to plug this information into Seaborn's countplot().

我还找到了注释的解决方法,但是我不确定这是否是最佳实现.

I also found a workaround for the annotations, but I am not sure if that is the best implementation.

任何帮助将不胜感激!

谢谢

plt.figure(figsize=(12,8))
ax = sns.countplot(x="AXLES", data=dfWIM, order=[3,4,5,6,7,8,9,10,11,12])
plt.title('Distribution of Truck Configurations')
plt.xlabel('Number of Axles')
plt.ylabel('Frequency [%]')

for p in ax.patches:
        ax.annotate('%{:.1f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))

使用Pandas的条形图,放弃下面的Seaborn,通过以下代码我更接近了我所需要的.感觉我正在使用许多变通办法,并且必须有一种更简单的方法来做到这一点.这种方法的问题:

I got closer to what I need with the following code, using Pandas' bar plot, ditching Seaborn. Feels like I'm using so many workarounds, and there has to be an easier way to do it. The issues with this approach:

  • Pandas的条形图函数中没有order关键字,就像Seaborn的countplot()一样,因此我不能像countplot()那样绘制3至12的所有类别.即使该类别中没有数据,我也需要显示它们.
  • 由于某些原因,辅助y轴将条形图和注释弄乱了(请参见在文本和条形图上绘制的白色网格线).

  • There is no order keyword in Pandas' bar plot function as Seaborn's countplot() has, so I cannot plot all categories from 3-12 as I did in the countplot(). I need to have them shown even if there is no data in that category.
  • The secondary y-axis messes up the bars and the annotation for some reason (see the white gridlines drawn over the text and bars).

plt.figure(figsize=(12,8))
plt.title('Distribution of Truck Configurations')
plt.xlabel('Number of Axles')
plt.ylabel('Frequency [%]')

ax = (dfWIM.AXLES.value_counts()/len(df)*100).sort_index().plot(kind="bar", rot=0)
ax.set_yticks(np.arange(0, 110, 10))

ax2 = ax.twinx()
ax2.set_yticks(np.arange(0, 110, 10)*len(df)/100)

for p in ax.patches:
    ax.annotate('{:.2f}%'.format(p.get_height()), (p.get_x()+0.15, p.get_height()+1))

推荐答案

您可以通过创建

You can do this by making a twinx axes for the frequencies. You can switch the two y axes around so the frequencies stay on the left and the counts on the right, but without having to recalculate the counts axis (here we use tick_left() and tick_right() to move the ticks and set_label_position to move the axis labels

然后您可以使用 matplotlib.ticker 模块(尤其是 href ="http://matplotlib.org/api/ticker_api.html#matplotlib.ticker.MultipleLocator" rel ="noreferrer"> ticker.MultipleLocator

You can then set the ticks using the matplotlib.ticker module, specifically ticker.MultipleLocator and ticker.LinearLocator.

对于注释,您可以使用patch.get_bbox().get_points()获取条形图的所有4个角的x和y位置.加上正确设置水平和垂直对齐方式,这意味着您无需在注释位置添加任何偏移.

As for your annotations, you can get the x and y locations for all 4 corners of the bar with patch.get_bbox().get_points(). This, along with setting the horizontal and vertical alignment correctly, means you don't need to add any arbitrary offsets to the annotation location.

最后,您需要关闭双轴坐标轴的网格,以防止网格线显示在条形图的顶部(

Finally, you need to turn the grid off for the twinned axis, to prevent grid lines showing up on top of the bars (ax2.grid(None))

这是一个有效的脚本:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import matplotlib.ticker as ticker

# Some random data
dfWIM = pd.DataFrame({'AXLES': np.random.normal(8, 2, 5000).astype(int)})
ncount = len(dfWIM)

plt.figure(figsize=(12,8))
ax = sns.countplot(x="AXLES", data=dfWIM, order=[3,4,5,6,7,8,9,10,11,12])
plt.title('Distribution of Truck Configurations')
plt.xlabel('Number of Axles')

# Make twin axis
ax2=ax.twinx()

# Switch so count axis is on right, frequency on left
ax2.yaxis.tick_left()
ax.yaxis.tick_right()

# Also switch the labels over
ax.yaxis.set_label_position('right')
ax2.yaxis.set_label_position('left')

ax2.set_ylabel('Frequency [%]')

for p in ax.patches:
    x=p.get_bbox().get_points()[:,0]
    y=p.get_bbox().get_points()[1,1]
    ax.annotate('{:.1f}%'.format(100.*y/ncount), (x.mean(), y), 
            ha='center', va='bottom') # set the alignment of the text

# Use a LinearLocator to ensure the correct number of ticks
ax.yaxis.set_major_locator(ticker.LinearLocator(11))

# Fix the frequency range to 0-100
ax2.set_ylim(0,100)
ax.set_ylim(0,ncount)

# And use a MultipleLocator to ensure a tick spacing of 10
ax2.yaxis.set_major_locator(ticker.MultipleLocator(10))

# Need to turn the grid on ax2 off, otherwise the gridlines end up on top of the bars
ax2.grid(None)

plt.savefig('snscounter.pdf')

这篇关于Seaborn:带有频率的countplot()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆