当数据集具有NaN值时,Seaborn对图错误 [英] Seaborn pairplot error when dataset has NaN values
问题描述
我有一个Pandas DataFrame,其中多列填充有数字和行,这些列和行具有第一列分类数据.显然,我在不同的列中的多行(当然不是整个空白行)中都有NaN值(和零),毕竟这是工程数据和真实数据.
I have a pandas DataFrame with multiple columns filled with numbers and rows which have the 1st columns categorical data. Obviously, I have NaN values (and zeros) in multiple rows (but not the entire blank row, of course) in different columns, after all, this is Engineering and real data.
行在其他非nan列中具有有价值的数据.并且这些列在其他行中都有有价值的数据,这些数据也不是nan
rows have valuable data in other columns which are not nan. and the columns have valuable data in other rows, which are also not nan
问题在于sns.pairplot不会忽略NaN值的相关性并返回错误(例如被零除,字符串到浮点转换等).
Problem is that sns.pairplot does not ignore NaN values for correlation and return errors (such as division by zero, string to float conversion, etc.).
我见过有人说要使用fillna()方法,但是我希望是否有人知道一种更优雅的方法,而不必通过该解决方案并花费大量时间来修复图,轴,滤镜等等.我不喜欢这种解决方法.
I have seen some people saying to use fillna() method, but I am hoping if anyone knows a more elegant way to do this, without having to go through that solution and spend numerous hours to fix the plot, axis, filters, etc. afterwards. I didn't like that work around.
类似于这个人的报道
https://github.com/mwaskom/seaborn/issues/1699
ZeroDivisionError:0.0不能提高到负数
ZeroDivisionError: 0.0 cannot be raised to a negative power
推荐答案
Seaborn的PairGrid
函数将允许您创建所需的图. PairGrid
比sns.pairplot
灵活得多.创建的任何PairGrid
都具有三个部分:上部三角形,下部三角形和对角线.
Seaborn's PairGrid
function will allow you to create your desired plot. PairGrid
is much more flexible than sns.pairplot
. Any PairGrid
created has three sections: the upper triangle, the lower triangle and the diagonal.
对于每个零件,您都可以定义自定义的绘图功能.上部和下部三角形部分可以采用任何接受两个要素数组(例如plt.scatter
)以及任何关联的关键字(例如marker
)的绘图功能.对角线部分接受绘图函数,该函数除了相关关键字外,还具有单个要素数组作为输入(例如plt.hist
).
For each part, you can define a customized plotting function. The upper and lower triangle sections can take any plotting function that accepts two arrays of features (such as plt.scatter
) as well as any associated keywords (e.g. marker
). The diagonal section accepts a plotting function that has a single feature array as input (such as plt.hist
) in addition to the relevant keywords.
出于您的目的,您可以过滤自定义功能中的NaN:
For your purpose, you can filter out the NaNs in your customized function(s):
from sklearn import datasets
import pandas as pd
import numpy as np
import seaborn as sns
data = datasets.load_iris()
iris = pd.DataFrame(data.data, columns=data.feature_names)
# break iris dataset to create NaNs
iris.iat[1, 0] = np.nan
iris.iat[4, 0] = np.nan
iris.iat[4, 2] = np.nan
iris.iat[5, 2] = np.nan
# create customized scatterplot that first filters out NaNs in feature pair
def scatterFilter(x, y, **kwargs):
interimDf = pd.concat([x, y], axis=1)
interimDf.columns = ['x', 'y']
interimDf = interimDf[(~ pd.isnull(interimDf.x)) & (~ pd.isnull(interimDf.y))]
ax = plt.gca()
ax = plt.plot(interimDf.x.values, interimDf.y.values, 'o', **kwargs)
# Create an instance of the PairGrid class.
grid = sns.PairGrid(data=iris, vars=list(iris.columns), size = 4)
# Map a scatter plot to the upper triangle
grid = grid.map_upper(scatterFilter, color='darkred')
# Map a histogram to the diagonal
grid = grid.map_diag(plt.hist, bins=10, edgecolor='k', color='darkred')
# Map a density plot to the lower triangle
grid = grid.map_lower(scatterFilter, color='darkred')
这将产生以下图: 虹膜Seaborn PairPlot
PairPlot
允许您绘制轮廓图,使用描述性统计信息注释面板等.有关更多详细信息,请参见
PairPlot
allows you to plot contour plots, annotate the panels with descriptive statistics etc. For more detail, see here.
这篇关于当数据集具有NaN值时,Seaborn对图错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!