Seaborn图书馆中的直方图,计数图和distplot有什么主要区别? [英] what is major difference between histogram,countplot and distplot in Seaborn library?

查看:612
本文介绍了Seaborn图书馆中的直方图,计数图和distplot有什么主要区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为它们看起来都一样,但是必须有所区别。

I think they all look the same but there must be some difference.

它们都以单列作为输入,而y轴具有

They all take a single column as input, and the y-axis has the count for all plots.

推荐答案

那些绘图函数 pyplot.hist seaborn.countplot seaborn.displot 都是绘制单个变量频率的辅助工具。根据此变量的性质,它们或多或少适合可视化。

Those plotting functions pyplot.hist, seaborn.countplot and seaborn.displot are all helper tools to plot the frequency of a single variable. Depending on the nature of this variable they might be more or less suitable for visualization.

连续变量 x 可以用直方图显示频率分布。

A continuous variable x may be histrogrammed to show the frequency distribution.

import matplotlib.pyplot as plt
import numpy as np

x = np.random.rand(100)*100
hist, edges = np.histogram(x, bins=np.arange(0,101,10))
plt.bar(edges[:-1], hist, align="edge", ec="k", width=np.diff(edges))

plt.show()

同样可以是使用 pyplot.hist seaborn.distplot

plt.hist(x, bins=np.arange(0,101,10), ec="k")

sns.distplot(x, bins=np.arange(0,101,10), kde=False, hist_kws=dict(ec="k"))

distplot 包装 pyplot.hist ,但还具有其他一些功能,例如显示内核密度估计。

distplot wraps pyplot.hist, but has some other features in addition that allow to e.g. show a kernel density estimate.

对于离散变量,直方图可能适用,也可能不合适。如果使用 numpy.histogram ,则垃圾箱必须恰好在预期的离散观测值之间。

For a discrete variable, a histogram may or may not be suitable. If you use a numpy.histogram, the bins would need to be exactly inbetween the expected discrete observations.

x1 = np.random.randint(1,11,100)

hist, edges = np.histogram(x1, bins=np.arange(1,12)-0.5)
plt.bar(edges[:-1], hist, align="edge", ec="k", width=np.diff(edges))
plt.xticks(np.arange(1,11))

也可以计算 x 中的唯一元素,

One could instead also count the unique elements in x,

u, counts = np.unique(x1, return_counts=True)
plt.bar(u, counts, align="center", ec="k", width=1)
plt.xticks(u)

产生与上述相同的情节。主要区别在于并非所有可能的观察都被占用的情况。说 5 甚至不是您数据的一部分。直方图方法仍会显示它,尽管它不是唯一元素的一部分。

resulting in the same plot as above. The main difference is for the case where not every possible observation is occupied. Say 5 is not even part of your data. A histogram approach would still show it, while it's not part of the unique elements.

x2 = np.random.choice([1,2,3,4,6,7,8,9,10], size=100)

plt.subplot(1,2,1)
plt.title("histogram")
hist, edges = np.histogram(x2, bins=np.arange(1,12)-0.5)
plt.bar(edges[:-1], hist, align="edge", ec="k", width=np.diff(edges))
plt.xticks(np.arange(1,11))

plt.subplot(1,2,2)
plt.title("counts")
u, counts = np.unique(x2, return_counts=True)
plt.bar(u.astype(str), counts, align="center", ec="k", width=1)

后者就是 seaborn.countplot 的作用。

sns.countplot(x2, color="C0")

因此它适合于离散变量或类别变量。

It is hence suitable for discrete or categorical variables.

所有函数 pyplot.hist seaborn.countplot seaborn.displot 充当matplotlib条形图的包装器,如果认为手动绘制此类条形图太麻烦,则可以使用。

对于连续变量,使用 pyplot.hist 可以使用seaborn.distplot 。对于离散变量, seaborn.countplot 更方便。

All functions pyplot.hist, seaborn.countplot and seaborn.displot act as wrappers for a matplotlib bar plot and may be used if manually plotting such bar plot is considered too cumbersome.
For continuous variables, a pyplot.hist or seaborn.distplot may be used. For discrete variables, a seaborn.countplot is more convenient.

这篇关于Seaborn图书馆中的直方图,计数图和distplot有什么主要区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆