计算 pandas 数据框中每一列的值变化 [英] Counting changes of value in each column in a data frame in pandas

查看:77
本文介绍了计算 pandas 数据框中每一列的值变化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有任何巧妙的方法可以计算熊猫数据框中每一列中值的变化数量?

Is there any neat way to count the number of changes of value in each column in a data frame in pandas?

我不想遍历每列,例如:

I don't want to have to loop myself over each column, e.g.:

import pandas as pd

frame = pd.DataFrame({
    'time':[1234567000,1234567005,1234567009],
    'X1':[96.32,96.01,96.05],
    'X2':[23.88,23.96,23.96]
},columns=['time','X1','X2']) 

print(frame)

changes = []
for column_name in frame.columns.values:
    print('column_name: {0}'.format(column_name))
    changes.append(sum(frame[column_name]!=frame[column_name].shift(1)))

print('changes: {0}'.format(changes))

返回:

         time     X1     X2
0  1234567000  96.32  23.88
1  1234567005  96.01  23.96
2  1234567009  96.05  23.96
column_name: time
column_name: X1
column_name: X2
changes: [3, 3, 2]

推荐答案

如果值是数字,则可以取相邻行之间的差,并测试差是否为非零.然后对每一列求和以计算值的变化次数:

If the values are numeric you could take the differences between adjacent rows and test if the difference is non-zero. Then take a sum down each column to count the number of changes in value:

In [48]: (frame.diff(axis=0) != 0).sum(axis=0)
Out[48]: 
time    3
X1      3
X2      2
dtype: int64

如果值不一定是数字,则更通用的方法是 将frame与自身shift进行比较-向下一行-这类似于您发布的代码,除了操作是在整个DataFrame而不是逐列进行的:

If the values are not necessarily numeric, then a more general way would be to compare the frame against itself shift-ed down by one row -- this is similar to the code you posted, except the operation is done on the entire DataFrame instead of column-by-column:

In [50]: (frame != frame.shift(axis=0)).sum(axis=0)
Out[50]: 
time    3
X1      3
X2      2
dtype: int64

数字版本更快,移位版本更健壮.

The numeric version is faster, the shifted version is more robust.

这篇关于计算 pandas 数据框中每一列的值变化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆