catplot(kind="count") 明显慢于 countplot() [英] catplot(kind="count") is significantly slower than countplot()

查看：59 发布时间：2021/6/1 20:46:21 python pandas matplotlib seaborn

本文介绍了catplot(kind="count") 明显慢于 countplot()的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理一个相当大的数据集(约4000万行).我发现，如果直接调用 sns.countplot()，那么我的可视化效果会很快:

I am working on a fairly large dataset (~40m rows). I have found that if I call sns.countplot() directly then my visualisation plots really quickly:

%%time 
ax = sns.countplot(x="age_band",data=acme)

但是，如果我使用 catplot(kind ="count")进行相同的可视化，则执行速度会大大降低:

However if I do the same visualisation using catplot(kind="count") then the speed of execution slows down dramatically:

%%time
g = sns.catplot(x="age_band",data=acme,kind="count")

有这么大的性能差异的原因吗?catplot() 是否在绘制数据之前对我的数据进行某种转换?

Is there a reason for such a large performance difference? Is catplot() doing some sort of conversion on my data before it can plot it?

如果有一个已知的原因，那么它是否扩展到所有图形级函数与轴级函数，例如 sns.scatterplot() 比 sns.relplot(kind=分散") 等?

If there is a known reason for this, then does it extend to all figure level functions vs axis level functions eg is sns.scatterplot() faster that sns.relplot(kind="scatter") etc?

我更喜欢使用 catplot()，因为我喜欢它的灵活性和在FacetGrid上轻松绘制的功能，但是如果要花费更多的时间来实现相同的绘制，那么我将只使用轴级直接作用.

My preference would be to use catplot() as I like its flexibility and easy plotting on a FacetGrid but if it is going to take so much longer to achieve the same plot then I will just use the axis level functions directly.

推荐答案

catplot 中有很多开销，或者在 FacetGrid 中有很多开销，这将确保类别沿着网格同步.考虑例如你有一个沿着网格的列绘制的变量，并不是每个年龄组都会出现.您仍然需要显示该非出现年龄段并保持其颜色.因此，两个彼此相邻的国家图不一定构成一个猫图.

There is a lot of overhead in catplot, or for that matter in FacetGrid, that will ensure that the categories are synchronized along the grid. Consider e.g. that you have a variable you plot along the columns of the grid for which not every age group occurs. You would still need to show that non-occuring age group and hold on to its color. Hence, two countplots next to each other do not necessarily make up one catplot.

但是，如果您只对单个计数图感兴趣，那么绘制图显然会过分杀伤.另一方面，与计数的条形图相比，即使是单个计数图也太过分了.就是

However, if you are only interested in a single countplot, a catplot is clearly overkill. On the other hand, even a single countplot is overkill compared to a barplot of the counts. That is,

counts = df["Category"].value_counts().sort_index()
colors = plt.cm.tab10(np.arange(len(counts)))
ax = counts.plot.bar(color=colors)

将是两倍

ax = sns.countplot(x="Category", data=df)

这篇关于catplot(kind="count") 明显慢于 countplot()的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

catplot(kind="count") 明显慢于 countplot() [英] catplot(kind="count") is significantly slower than countplot()

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

catplot(kind="count") 明显慢于 countplot() [英] catplot(kind=&quot;count&quot;) is significantly slower than countplot()

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

catplot(kind="count") 明显慢于 countplot() [英] catplot(kind="count") is significantly slower than countplot()

登录关闭