在 pandas 数据框中跨行获取最后一个非na值 [英] Getting last non na value across rows in a pandas dataframe

查看:49
本文介绍了在 pandas 数据框中跨行获取最后一个非na值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个形状为(40,500)的数据框.数据框中的每一行都有一些数值,直到有一些可变的列号k为止,此后的所有条目都是nan.

I have a dataframe of shape (40,500). Each row in the dataframe has some numerical values till some variable column number k, and all the entries after that are nan.

我正在尝试获取每行中最后一个非Nan列的值.有没有一种方法可以不循环遍历数据帧的所有行?

I am trying to get the value of last non-nan column in each row. Is there a way to do this without looping through all the rows of the dataframe?

示例数据框:

2016-06-02 7.080 7.079 7.079 7.079 7.079 7.079   nan   nan   nan
2016-06-08 7.053 7.053 7.053 7.053 7.053 7.054   nan   nan   nan  
2016-06-09 7.061 7.061 7.060 7.060 7.060 7.060   nan   nan   nan   
2016-06-14   nan   nan   nan   nan   nan   nan   nan   nan   nan  
2016-06-15 7.066 7.066 7.066 7.066   nan   nan   nan   nan   nan  
2016-06-16 7.067 7.067 7.067 7.067 7.067 7.067 7.068 7.068   nan  
2016-06-21 7.053 7.053 7.052   nan   nan   nan   nan   nan   nan  
2016-06-22 7.049 7.049   nan   nan   nan   nan   nan   nan   nan  
2016-06-28 7.058 7.058 7.059 7.059 7.059 7.059 7.059 7.059 7.059  

要求输出

2016-06-02 7.079 
2016-06-08 7.054
2016-06-09 7.060
2016-06-14   nan 
2016-06-15 7.066
2016-06-16 7.068 
2016-06-21 7.052 
2016-06-22 7.049
2016-06-28 7.059  

推荐答案

您需要 last_valid_index 具有自定义功能,因为如果所有值均为NaN,它将返回KeyError:

def f(x):
    if x.last_valid_index() is None:
        return np.nan
    else:
        return x[x.last_valid_index()]

df['status'] = df.apply(f, axis=1)
print (df)
                1      2      3      4      5      6      7      8      9  \
0                                                                           
2016-06-02  7.080  7.079  7.079  7.079  7.079  7.079    NaN    NaN    NaN   
2016-06-08  7.053  7.053  7.053  7.053  7.053  7.054    NaN    NaN    NaN   
2016-06-09  7.061  7.061  7.060  7.060  7.060  7.060    NaN    NaN    NaN   
2016-06-14    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   
2016-06-15  7.066  7.066  7.066  7.066    NaN    NaN    NaN    NaN    NaN   
2016-06-16  7.067  7.067  7.067  7.067  7.067  7.067  7.068  7.068    NaN   
2016-06-21  7.053  7.053  7.052    NaN    NaN    NaN    NaN    NaN    NaN   
2016-06-22  7.049  7.049    NaN    NaN    NaN    NaN    NaN    NaN    NaN   
2016-06-28  7.058  7.058  7.059  7.059  7.059  7.059  7.059  7.059  7.059   

            status  
0                   
2016-06-02   7.079  
2016-06-08   7.054  
2016-06-09   7.060  
2016-06-14     NaN  
2016-06-15   7.066  
2016-06-16   7.068  
2016-06-21   7.052  
2016-06-22   7.049  
2016-06-28   7.059  

替代解决方案- fillna 使用方法ffill并通过 iloc选择最后一列:

Alternative solution - fillna with method ffill and select last column by iloc:

df['status'] = df.ffill(axis=1).iloc[:, -1]
print (df)
            status  
0                   
2016-06-02   7.079  
2016-06-08   7.054  
2016-06-09   7.060  
2016-06-14     NaN  
2016-06-15   7.066  
2016-06-16   7.068  
2016-06-21   7.052  
2016-06-22   7.049  
2016-06-28   7.059  

这篇关于在 pandas 数据框中跨行获取最后一个非na值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆