当数据集具有NaN值时,Seaborn对图错误 [英] Seaborn pairplot error when dataset has NaN values

查看:501
本文介绍了当数据集具有NaN值时,Seaborn对图错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Pandas DataFrame,其中多列填充有数字和行,这些列和行具有第一列分类数据.显然,我在不同的列中的多行(当然不是整个空白行)中都有NaN值(和零),毕竟这是工程数据和真实数据.

I have a pandas DataFrame with multiple columns filled with numbers and rows which have the 1st columns categorical data. Obviously, I have NaN values (and zeros) in multiple rows (but not the entire blank row, of course) in different columns, after all, this is Engineering and real data.

行在其他非nan列中具有有价值的数据.并且这些列在其他行中都有有价值的数据,这些数据也不是nan

rows have valuable data in other columns which are not nan. and the columns have valuable data in other rows, which are also not nan

问题在于sns.pairplot不会忽略NaN值的相关性并返回错误(例如被零除,字符串到浮点转换等).

Problem is that sns.pairplot does not ignore NaN values for correlation and return errors (such as division by zero, string to float conversion, etc.).

我见过有人说要使用fillna()方法,但是我希望是否有人知道一种更优雅的方法,而不必通过该解决方案并花费大量时间来修复图,轴,滤镜等等.我不喜欢这种解决方法.

I have seen some people saying to use fillna() method, but I am hoping if anyone knows a more elegant way to do this, without having to go through that solution and spend numerous hours to fix the plot, axis, filters, etc. afterwards. I didn't like that work around.

类似于这个人的报道

https://github.com/mwaskom/seaborn/issues/1699

ZeroDivisionError:0.0不能提高到负数

ZeroDivisionError: 0.0 cannot be raised to a negative power

在此处输入图片描述

推荐答案

Seaborn的PairGrid函数将允许您创建所需的图. PairGridsns.pairplot灵活得多.创建的任何PairGrid都具有三个部分:上部三角形,下部三角形和对角线.

Seaborn's PairGrid function will allow you to create your desired plot. PairGrid is much more flexible than sns.pairplot. Any PairGrid created has three sections: the upper triangle, the lower triangle and the diagonal.

对于每个零件,您都可以定义自定义的绘图功能.上部和下部三角形部分可以采用任何接受两个要素数组(例如plt.scatter)以及任何关联的关键字(例如marker)的绘图功能.对角线部分接受绘图函数,该函数除了相关关键字外,还具有单个要素数组作为输入(例如plt.hist).

For each part, you can define a customized plotting function. The upper and lower triangle sections can take any plotting function that accepts two arrays of features (such as plt.scatter) as well as any associated keywords (e.g. marker). The diagonal section accepts a plotting function that has a single feature array as input (such as plt.hist) in addition to the relevant keywords.

出于您的目的,您可以过滤自定义功能中的NaN:

For your purpose, you can filter out the NaNs in your customized function(s):

from sklearn import datasets
import pandas as pd
import numpy as np
import seaborn as sns

data = datasets.load_iris()
iris = pd.DataFrame(data.data, columns=data.feature_names)

# break iris dataset to create NaNs
iris.iat[1, 0] = np.nan
iris.iat[4, 0] = np.nan
iris.iat[4, 2] = np.nan
iris.iat[5, 2] = np.nan

# create customized scatterplot that first filters out NaNs in feature pair
def scatterFilter(x, y, **kwargs):

    interimDf = pd.concat([x, y], axis=1)
    interimDf.columns = ['x', 'y']
    interimDf = interimDf[(~ pd.isnull(interimDf.x)) & (~ pd.isnull(interimDf.y))]

    ax = plt.gca()
    ax = plt.plot(interimDf.x.values, interimDf.y.values, 'o', **kwargs)

# Create an instance of the PairGrid class.
grid = sns.PairGrid(data=iris, vars=list(iris.columns), size = 4)

# Map a scatter plot to the upper triangle
grid = grid.map_upper(scatterFilter, color='darkred')

# Map a histogram to the diagonal
grid = grid.map_diag(plt.hist, bins=10, edgecolor='k', color='darkred')

# Map a density plot to the lower triangle
grid = grid.map_lower(scatterFilter, color='darkred')

这将产生以下图: 虹膜Seaborn PairPlot

PairPlot允许您绘制轮廓图,使用描述性统计信息注释面板等.有关更多详细信息,请参见

PairPlot allows you to plot contour plots, annotate the panels with descriptive statistics etc. For more detail, see here.

这篇关于当数据集具有NaN值时,Seaborn对图错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆