有什么方法可以加速seaborns pairplot [英] What are ways to speed up seaborns pairplot

查看:105
本文介绍了有什么方法可以加速seaborns pairplot的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 250.000 行但 140 列的数据框,我正在尝试构建一个配对图.的变量.我知道子图的数量是巨大的,以及绘制这些图所需的时间.(我在配备 3.4 GHZ 和 32 GB RAM 的 i5 上等了一个多小时).

I have a dataframe with 250.000 rows but 140 columns and I'm trying to construct a pair plot. of the variables. I know the number of subplots is huge, as well as the time it takes to do the plots. (I'm waiting for more than an hour on an i5 with 3,4 GHZ and 32 GB RAM).

记住 scikit learn 允许并行构建随机森林,我正在检查这是否也适用于 seaborn.然而,我什么也没找到.源代码似乎为每个图像调用 matplotlib 绘图函数.

Remebering that scikit learn allows to construct random forests in parallel, I was checking if this was possible also with seaborn. However, I didn't find anything. The source code seems to call the matplotlib plot function for every single image.

这不能并行化吗?如果是,从这里开始的好方法是什么?

Couldn't this be parallelised? If yes, what is a good way to start from here?

推荐答案

如果速度瓶颈确实存在,您可以将 DataFrame 降采样到 1000 行以快速查看,而不是并行化发生在那里.通常,1000 分足以大致了解正在发生的事情.

Rather than parallelizing, you could downsample your DataFrame to say, 1000 rows to get a quick peek, if the speed bottleneck is indeed occurring there. 1000 points is enough to get a general idea of what's going on, usually.

sns.pairplot(df.sample(1000)).

这篇关于有什么方法可以加速seaborns pairplot的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆