Seaborn:避免绘制缺失值(线图) [英] Seaborn: Avoid plotting missing values (line plot)

查看:43
本文介绍了Seaborn:避免绘制缺失值(线图)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要一个线图来指示是否缺少一条数据,例如:

但是,下面的代码填充了缺失的数据,从而创建了一个可能具有误导性的图表:

将pandas导入为pd将 seaborn 作为 sns 导入从 matplotlib 导入 pyplot 作为 plt# 加载 csvdf=pd.read_csv('data.csv')# 绘制图形g = sns.lineplot(x=日期",y=数据",数据=df)plt.show()

我应该在代码中更改哪些内容以避免填充缺失值?

csv 如下所示:

日期,数据01-12-03,10001-01-04,01-02-04,01-03-04,01-04-04,01-05-04,3901-06-04,01-07-04,01-08-04,5301-09-04,01-10-04,01-11-04,01-12-04,01-01-05,28...01-04-18,1401-05-18,1201-06-18,801-07-18,8

.csv 链接:

优点:

  • 易于实施
  • 数据中被 None 包围的异常值 很容易在图表上注意到

缺点:

  • 生成这样的图需要很长时间(相对于lineplot)
  • 当有很多点时,很难阅读这样的图表

3) 如果你需要 seaborn 并且你需要 lineplot:hue 参数可用于将单独的部分放在单独的桶中.我们使用 nans 的出现次数对部分进行编号.

fig, ax = plt.subplots(figsize=(10, 5))情节 = sns.lineplot(斧头=斧头, 数据=df, x="日期", y="数据",hue=df["Data"].isna().cumsum(), 调色板=["blue"]*sum(df["Data"].isna()),图例=假,标记=真)ax.set_xticklabels([])plt.show()

优点:

  • 线图
  • 易于阅读
  • 生成速度比点图快

缺点:

  • 数据中被None包围的异常值不会绘制在图表上

图表如下所示:

I want a line plot to indicate if a piece of data is missing such as:

However, the code below fills the missing data, creating a potentially misleading chart:

import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

# load csv
df=pd.read_csv('data.csv')
# plot a graph
g = sns.lineplot(x="Date", y="Data", data=df)
plt.show()

What should I change in my code to avoid filling missing values?

csv looks as following:

Date,Data
01-12-03,100
01-01-04,
01-02-04,
01-03-04,
01-04-04,
01-05-04,39
01-06-04,
01-07-04,
01-08-04,53
01-09-04,
01-10-04,
01-11-04,
01-12-04,
01-01-05,28
   ...
01-04-18,14
01-05-18,12
01-06-18,8
01-07-18,8

link to .csv: https://drive.google.com/file/d/1s-RJfAFYD90m4SrFDzIba7EQP4C-J0yO/view?usp=sharing

解决方案

Based on Denziloe answer:

there are three options:

1) Use pandas or matplotlib.

2) If you need seaborn: not what it's for but for regular dates like abovepointplot can use out of the box.

fig, ax = plt.subplots(figsize=(10, 5))

plot = sns.pointplot(
    ax=ax,
    data=df, x="Date", y="Data"
)

ax.set_xticklabels([])

plt.show()

graph built on data from the question will look as below:

Pros:

  • easy to implement
  • an outlier in the data which is surrounded by None will be easy to notice on the graph

Cons:

  • it takes a long time to generate such a graph (compared to lineplot)
  • when there are many points it becomes hard to read such graphs

3) If you need seaborn and you need lineplot: hue argument can be used to put the separate sections in separate buckets. We number the sections using the occurrences of nans.

fig, ax = plt.subplots(figsize=(10, 5))

plot = sns.lineplot(
    ax=ax
    , data=df, x="Date", y="Data"
    , hue=df["Data"].isna().cumsum()
    , palette=["blue"]*sum(df["Data"].isna())
    , legend=False, markers=True
)

ax.set_xticklabels([])

plt.show()

Pros:

  • lineplot
  • easy to read
  • generated faster than point plot

Cons:

  • an outlier in the data which is surrounded by None will not be drawn on the chart

The graph will look as below:

这篇关于Seaborn:避免绘制缺失值(线图)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆