绘制包含NaN值的 pandas 数据框列 [英] plotting a pandas dataframe column which contains NaN values

查看:63
本文介绍了绘制包含NaN值的 pandas 数据框列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在将熊猫数据框的第二列绘制到twinx y轴时遇到一些问题.我认为这可能是因为第二个有问题的列包含NaN值.之所以有NaN值,是因为每10年只有一次可用的数据,尽管对于第一列来说,每年都有可用的数据.它们是使用np.nan生成的,为清楚起见,在本文的结尾处我将其包括在内.

I'm having some issues plotting a second column from a pandas dataframe onto a twinx y-axis. I think it might be because the second problematic column contains NaN values. The NaN values are there because there was only data available every 10th year, although for the first column there was data available every year. They were generated in using np.nan which I included at the end for clarity.

直觉是将两个系列绘制在同一x轴上,以显示它们随时间的变化趋势.

The intuition here is to plot both series on the same x-axis to show how they trend over time.

这是我的代码和数据框:

Here's my code and dataframe:

import pandas as pd
import numpy as np
import matplotlib as plt
import matplotlib.pyplot as plt

list1 = ['1297606', '1300760', '1303980', '1268987', '1333521', '1328570', 
         '1328112', '1353671', '1371285', '1396658', '1429247', '1388937', 
         '1359145', '1330414', '1267415', '1210883', '1221585', '1186039', 
         '884273', '861789', '857475', '853485', '854122', '848163', '839226', 
         '820151', '852385', '827609', '825564', '789217', '765651']

list1a = [1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 
          1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 
          2004, 2005, 2006, 2007, 2008, 2009, 2010]

list3b = [121800016.0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 
          145279588.0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 
          160515434.5, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 
          168140487.0]

d = {'Year': list1a,'Abortions per Year': list1, 
     'Affiliation with Religious Institutions': list3b}
newdf = pd.DataFrame(data=d)

newdf.set_index('Year',inplace=True)

fig, ax1 = plt.subplots(figsize=(20,5))

y2min = min(newdf['Affiliation with Religious Institutions'])
y2max = max(newdf['Affiliation with Religious Institutions'])
ax1.plot(newdf['Abortions per Year'])
#ax1.set_xticks(newdf.index)
ax1b = ax1.twinx()
ax1b.set_ylim(y2min*0.8,y2max*1.2)
ax1b.plot(newdf['Affiliation with Religious Institutions'])
plt.show()

我最终得到了一个没有显示第二个图的图表. (当我将第二个图表更改为每年都有数字值时,将对其进行绘制).这是第二个图(具有NaN值)-被忽略:

I end up with a chart which doesn't show the second plot. (When I changed the second plot to have numeric values for each year, it plots it). Here's the second plot (with NaN values) -- being ignored:

感谢您的任何建议.

*如何为第二列生成np.nan值:我循环遍历索引列,并且每年在没有数据的情况下,将np.nan返回到列表,然后将其作为列.

*how the np.nan values were generated for the second column: I looped thru the index column and for every year without data, returned np.nan to the list, which was then made a column.

for i in range(len(list1a)):
    if list1a[i] in list3a:
        var = list2[j]
        list3b.append(var)

        j+=1
    else:
        var = np.nan
        list3b.append(var)

推荐答案

两件事.您需要将Abortions per Year列转换为数字类型以进行绘图,至少要对您提供的以str格式的数据进行绘图.第二,您可以通过在绘制之前删除nan值来将Affiliation with Religious Institutions绘制为一条线.

Two things. You need to convert the Abortions per Year column to a numeric type for plotting, at least for the data you provided which is in str format; second, you can plot Affiliation with Religious Institutions as a line by dropping the nan values before plotting.

ax1.plot(newdf['Abortions per Year'].astype(int))

...

ax1b.plot(newdf['Affiliation with Religious Institutions'].dropna())

这篇关于绘制包含NaN值的 pandas 数据框列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆