Seaborn:避免绘制缺失值(线图) [英] Seaborn: Avoid plotting missing values (line plot)
问题描述
我想要一个线图来指示是否缺少一条数据,例如:
但是,下面的代码填充了缺失的数据,从而创建了一个可能具有误导性的图表:
将pandas导入为pd将 seaborn 作为 sns 导入从 matplotlib 导入 pyplot 作为 plt# 加载 csvdf=pd.read_csv('data.csv')# 绘制图形g = sns.lineplot(x=日期",y=数据",数据=df)plt.show()
我应该在代码中更改哪些内容以避免填充缺失值?
csv 如下所示:
日期,数据01-12-03,10001-01-04,01-02-04,01-03-04,01-04-04,01-05-04,3901-06-04,01-07-04,01-08-04,5301-09-04,01-10-04,01-11-04,01-12-04,01-01-05,28...01-04-18,1401-05-18,1201-06-18,801-07-18,8
.csv 链接:
优点:
- 易于实施
- 数据中被
None
包围的异常值 很容易在图表上注意到
缺点:
- 生成这样的图需要很长时间(相对于
lineplot
) - 当有很多点时,很难阅读这样的图表
3) 如果你需要 seaborn
并且你需要 lineplot
:hue
参数可用于将单独的部分放在单独的桶中.我们使用 nans 的出现次数对部分进行编号.
fig, ax = plt.subplots(figsize=(10, 5))情节 = sns.lineplot(斧头=斧头, 数据=df, x="日期", y="数据",hue=df["Data"].isna().cumsum(), 调色板=["blue"]*sum(df["Data"].isna()),图例=假,标记=真)ax.set_xticklabels([])plt.show()
优点:
- 线图
- 易于阅读
- 生成速度比点图快
缺点:
- 数据中被
None
包围的异常值不会绘制在图表上
图表如下所示:
I want a line plot to indicate if a piece of data is missing such as:
However, the code below fills the missing data, creating a potentially misleading chart:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
# load csv
df=pd.read_csv('data.csv')
# plot a graph
g = sns.lineplot(x="Date", y="Data", data=df)
plt.show()
What should I change in my code to avoid filling missing values?
csv looks as following:
Date,Data
01-12-03,100
01-01-04,
01-02-04,
01-03-04,
01-04-04,
01-05-04,39
01-06-04,
01-07-04,
01-08-04,53
01-09-04,
01-10-04,
01-11-04,
01-12-04,
01-01-05,28
...
01-04-18,14
01-05-18,12
01-06-18,8
01-07-18,8
link to .csv: https://drive.google.com/file/d/1s-RJfAFYD90m4SrFDzIba7EQP4C-J0yO/view?usp=sharing
Based on Denziloe answer:
there are three options:
1) Use pandas
or matplotlib
.
2) If you need seaborn
: not what it's for but for regular dates like abovepointplot
can use out of the box.
fig, ax = plt.subplots(figsize=(10, 5))
plot = sns.pointplot(
ax=ax,
data=df, x="Date", y="Data"
)
ax.set_xticklabels([])
plt.show()
graph built on data from the question will look as below:
Pros:
- easy to implement
- an outlier in the data which is surrounded by
None
will be easy to notice on the graph
Cons:
- it takes a long time to generate such a graph (compared to
lineplot
) - when there are many points it becomes hard to read such graphs
3) If you need seaborn
and you need lineplot
:
hue
argument can be used to put the separate sections in separate buckets. We number the sections using the occurrences of nans.
fig, ax = plt.subplots(figsize=(10, 5))
plot = sns.lineplot(
ax=ax
, data=df, x="Date", y="Data"
, hue=df["Data"].isna().cumsum()
, palette=["blue"]*sum(df["Data"].isna())
, legend=False, markers=True
)
ax.set_xticklabels([])
plt.show()
Pros:
- lineplot
- easy to read
- generated faster than point plot
Cons:
- an outlier in the data which is surrounded by
None
will not be drawn on the chart
The graph will look as below:
这篇关于Seaborn:避免绘制缺失值(线图)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!