catplot(kind="count") 明显慢于 countplot() [英] catplot(kind="count") is significantly slower than countplot()

查看:59
本文介绍了catplot(kind="count") 明显慢于 countplot()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个相当大的数据集(约4000万行).我发现,如果直接调用 sns.countplot(),那么我的可视化效果会很快:

I am working on a fairly large dataset (~40m rows). I have found that if I call sns.countplot() directly then my visualisation plots really quickly:

%%time 
ax = sns.countplot(x="age_band",data=acme)

但是,如果我使用 catplot(kind ="count")进行相同的可视化,则执行速度会大大降低:

However if I do the same visualisation using catplot(kind="count") then the speed of execution slows down dramatically:

%%time
g = sns.catplot(x="age_band",data=acme,kind="count")

有这么大的性能差异的原因吗?catplot() 是否在绘制数据之前对我的数据进行某种转换?

Is there a reason for such a large performance difference? Is catplot() doing some sort of conversion on my data before it can plot it?

如果有一个已知的原因,那么它是否扩展到所有图形级函数与轴级函数,例如 sns.scatterplot()sns.relplot(kind=分散") 等?

If there is a known reason for this, then does it extend to all figure level functions vs axis level functions eg is sns.scatterplot() faster that sns.relplot(kind="scatter") etc?

我更喜欢使用 catplot(),因为我喜欢它的灵活性和在FacetGrid上轻松绘制的功能,但是如果要花费更多的时间来实现相同的绘制,那么我将只使用轴级直接作用.

My preference would be to use catplot() as I like its flexibility and easy plotting on a FacetGrid but if it is going to take so much longer to achieve the same plot then I will just use the axis level functions directly.

推荐答案

catplot 中有很多开销,或者在 FacetGrid 中有很多开销,这将确保类别沿着网格同步.考虑例如你有一个沿着网格的列绘制的变量,并不是每个年龄组都会出现.您仍然需要显示该非出现年龄段并保持其颜色.因此,两个彼此相邻的国家图不一定构成一个猫图.

There is a lot of overhead in catplot, or for that matter in FacetGrid, that will ensure that the categories are synchronized along the grid. Consider e.g. that you have a variable you plot along the columns of the grid for which not every age group occurs. You would still need to show that non-occuring age group and hold on to its color. Hence, two countplots next to each other do not necessarily make up one catplot.

但是,如果您只对单个计数图感兴趣,那么绘制图显然会过分杀伤.另一方面,与计数的条形图相比,即使是单个计数图也太过分了.就是

However, if you are only interested in a single countplot, a catplot is clearly overkill. On the other hand, even a single countplot is overkill compared to a barplot of the counts. That is,

counts = df["Category"].value_counts().sort_index()
colors = plt.cm.tab10(np.arange(len(counts)))
ax = counts.plot.bar(color=colors)

将是两倍

ax = sns.countplot(x="Category", data=df)

这篇关于catplot(kind="count") 明显慢于 countplot()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆