图中的 Altair 缺失值 [英] Altair missing value in graph

查看:59
本文介绍了图中的 Altair 缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import altair as alt
import numpy as np

#create dataframe
df = pd.DataFrame({'date': ['2020-04-03', '2020-04-04', '2020-04-05', '2020-04-06','2020-04-03', '2020-04-04','2020-04-05','2020-04-06'],
                    'ID': ['a','a','a','a','b','b','b','b'],'bar': [np.nan,8,np.nan,np.nan, np.nan, 8,np.nan,np.nan],
                    'line': [8,np.nan,10,8, 4, 5,6,7] })

df:
         date ID  bar  line
0  2020-04-03  a  NaN   8.0
1  2020-04-04  a  8.0   NaN
2  2020-04-05  a  NaN  10.0
3  2020-04-06  a  NaN   8.0
4  2020-04-03  b  NaN   4.0
5  2020-04-04  b  8.0   5.0
6  2020-04-05  b  NaN   6.0
7  2020-04-06  b  NaN   7.0

# create graph
bars = alt.Chart(df).mark_bar(color="grey", size=5).encode(
         alt.X('monthdate(date):O'), y='bar:Q')

lines = alt.Chart(df).mark_line(point=True,size=2,).encode(
            alt.X('monthdate(date):O'), y='line:Q')

alt.layer(bars + lines,width=350,height=150).facet(facet=alt.Facet('ID:N'),
    ).resolve_axis(y='independent',x='independent')

它给出了这个图像

有谁知道为什么这条线有一个中断 (a) 以及如何通过丢失的数据点绘制这条线?我知道我可以使用impute"计算平均值并替换缺失值.但这意味着该日期的数据点实际上并非如此.

Has anyone an idea why the line has a break (a) and how to draw the line through the missing data point? I know I could use "impute" to calculate the mean and replace the missing value. But this implies a data point for the date which is actually not true.

感谢您的任何提示、想法或帮助!

Thanks for any hints, ideas or help!

推荐答案

这是因为您在数据框中将值记录为 NaN,因此此观察值有一个有效的日期条目,y-xis 有一个 NaN无法绘制.

It is because you have the value recorded as NaN in the dataframe, so there is a valid date entry for this observation, and an NaN for the y-xis which can't be plotted.

这是您目前拥有的:

df = pd.DataFrame({'date': ['2020-04-03', '2020-04-04', '2020-04-05', '2020-04-06','2020-04-03', '2020-04-04','2020-04-05','2020-04-06'],
                    'ID': ['a','a','a','a','b','b','b','b'],
                    'line': [8,np.nan,10,8, 4, 5,6,7] })

alt.Chart(df).mark_line(point=True,size=2,).encode(
            alt.X('monthdate(date):O'), y='line:Q')

如果你去掉 NaN,你会得到你想要的行为

If you drop the NaNs, you will get the behavior that you want

alt.Chart(df.dropna()).mark_line(point=True,size=2).encode(
            alt.X('monthdate(date):O'), y='line:Q')

对于上面的示例,如果您希望条形图保留所有值并且不删除行列中带有 NaN 的行,同时仍然使用图层和分面,则需要在两个图表中引用相同的数据框,并使用 Altair 的 transform_filter 而不是 pandas dropna:

For your example above if you want the barplot to retain all values and not drop the rows with NaN in the line column, while still using both layer and facet, you need to reference the same dataframe in both charts an use Altair's transform_filter instead of pandas dropna:

(alt.Chart(df).mark_line(point=True,size=2)
 .transform_filter('isValid(datum.line)')
 .encode(alt.X('monthdate(date):O'), y='line:Q'))

这篇关于图中的 Altair 缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆