Pandas数据框:每批行的操作 [英] Pandas dataframe : Operation per batch of rows

查看:84
本文介绍了Pandas数据框:每批行的操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个pandas DataFrame df,我想为其每行计算一些统计信息.

I have a pandas DataFrame df for which I want to compute some statistics per batch of rows.

例如,假设我有一个batch_size = 200000.

For example, let's say that I have a batch_size = 200000.

对于每批batch_size行,我想为我的DataFrame的列ID提供唯一值的数量.

For each batch of batch_size rows I would like to have the number of unique values for a column ID of my DataFrame.

我该怎么做?

以下是我想要的示例:

print(df)

>>
+-------+
|     ID|
+-------+
|      1|
|      1|
|      2|
|      2|
|      2|
|      3|
|      3|
|      3|
|      3|
+-------+

batch_size = 3

my_new_function(df,batch_size)

>>
For batch 1 (0 to 2) :
2 unique values 
1 appears 2 times
2 appears 1 time

For batch 2 (3 to 5) : 
2 unique values 
2 appears 2 times
3 appears 1 time

For batch 3 (6 to 8) 
1 unique values 
3 appears 3 times

注意:输出当然可以是一个简单的DataFrame

Note : The output can of course be a simple DataFrame

推荐答案

请参见

See here for splitting the dataframe. After that I would do:

from collections import Counter
Counter(batch_df['ID'].tolist())

这篇关于Pandas数据框:每批行的操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆