如何计算 pandas 数据框中各列的非NaN值? [英] How to count non NaN values accross columns in pandas dataframe?

查看：91 发布时间：2020/5/24 1:10:18 python python-2.7 pandas dataframe

本文介绍了如何计算 pandas 数据框中各列的非NaN值?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的数据如下:

            Close   a   b   c   d   e   Time    
2015-12-03  2051.25 5   4   3   1   1   05:00:00    
2015-12-04  2088.25 5   4   3   1   NaN 06:00:00
2015-12-07  2081.50 5   4   3   NaN NaN 07:00:00
2015-12-08  2058.25 5   4   NaN NaN NaN 08:00:00
2015-12-09  2042.25 5   NaN NaN NaN NaN 09:00:00

我需要水平"计数[Na]以外的[a]到[e]列中的值.结果就是这样:

I need to count 'horizontally' the values in the columns ['a'] to ['e'] that are not NaN. So the outcome would be this:

df['Count'] = .....
df

            Close   a   b   c   d   e   Time     Count
2015-12-03  2051.25 5   4   3   1   1   05:00:00 5  
2015-12-04  2088.25 5   4   3   1   NaN 06:00:00 4
2015-12-07  2081.50 5   4   3   NaN NaN 07:00:00 3
2015-12-08  2058.25 5   4   NaN NaN NaN 08:00:00 2
2015-12-09  2042.25 5   NaN NaN NaN NaN 09:00:00 1

谢谢

推荐答案

您可以从df中进行子选择，并通过axis=1来调用count:

You can subselect from your df and call count passing axis=1:

In [24]:
df['count'] = df[list('abcde')].count(axis=1)
df

Out[24]:
              Close  a   b   c   d   e      Time  count
2015-12-03  2051.25  5   4   3   1   1  05:00:00      5
2015-12-04  2088.25  5   4   3   1 NaN  06:00:00      4
2015-12-07  2081.50  5   4   3 NaN NaN  07:00:00      3
2015-12-08  2058.25  5   4 NaN NaN NaN  08:00:00      2
2015-12-09  2042.25  5 NaN NaN NaN NaN  09:00:00      1

时间

In [25]:
%timeit df[['a', 'b', 'c', 'd', 'e']].apply(lambda x: sum(x.notnull()), axis=1)
%timeit df.drop(['Close', 'Time'], axis=1).count(axis=1)
%timeit df[list('abcde')].count(axis=1)

100 loops, best of 3: 3.28 ms per loop
100 loops, best of 3: 2.76 ms per loop
100 loops, best of 3: 2.98 ms per loop

apply是最慢的，这不足为奇，drop版本略快，但从语义上讲，我更喜欢只传递感兴趣的cols列表并调用count以提高可读性

apply is the slowest which is not a surprise, the drop version is marginally faster but semantically I prefer just passing the list of cols of interest and calling count for readability

嗯，我现在的时机不断变化:

Hmm I keep getting varying timings now:

In [27]:
%timeit df[['a', 'b', 'c', 'd', 'e']].apply(lambda x: sum(x.notnull()), axis=1)
%timeit df.drop(['Close', 'Time'], axis=1).count(axis=1)
%timeit df[list('abcde')].count(axis=1)
%timeit df[['a', 'b', 'c', 'd', 'e']].count(axis=1)

100 loops, best of 3: 3.33 ms per loop
100 loops, best of 3: 2.7 ms per loop
100 loops, best of 3: 2.7 ms per loop
100 loops, best of 3: 2.57 ms per loop

更多时间

In [160]:
%timeit df[['a', 'b', 'c', 'd', 'e']].apply(lambda x: sum(x.notnull()), axis=1)
%timeit df.drop(['Close', 'Time'], axis=1).count(axis=1)
%timeit df[list('abcde')].count(axis=1)
%timeit df[['a', 'b', 'c', 'd', 'e']].count(axis=1)
%timeit df[list('abcde')].notnull().sum(axis=1) 

1000 loops, best of 3: 1.4 ms per loop
1000 loops, best of 3: 1.14 ms per loop
1000 loops, best of 3: 1.11 ms per loop
1000 loops, best of 3: 1.11 ms per loop
1000 loops, best of 3: 1.05 ms per loop

在该数据集上，测试notnull并求和(因为notnull将产生布尔掩码)似乎更快

It seems that testing for notnull and summing (as notnull will produce a boolean mask) is quicker on this dataset

在5万行df中，最后一种方法要快一些:

On a 50k row df the last method is slightly quicker:

In [172]:
%timeit df[['a', 'b', 'c', 'd', 'e']].apply(lambda x: sum(x.notnull()), axis=1)
%timeit df.drop(['Close', 'Time'], axis=1).count(axis=1)
%timeit df[list('abcde')].count(axis=1)
%timeit df[['a', 'b', 'c', 'd', 'e']].count(axis=1)
%timeit df[list('abcde')].notnull().sum(axis=1) 

1 loops, best of 3: 5.83 s per loop
100 loops, best of 3: 6.15 ms per loop
100 loops, best of 3: 6.49 ms per loop
100 loops, best of 3: 6.04 ms per loop

这篇关于如何计算 pandas 数据框中各列的非NaN值?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何计算 pandas 数据框中各列的非NaN值? [英] How to count non NaN values accross columns in pandas dataframe?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何计算 pandas 数据框中各列的非NaN值? [英] How to count non NaN values accross columns in pandas dataframe?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭