Pandas数据框:每批行的操作 [英] Pandas dataframe : Operation per batch of rows
本文介绍了Pandas数据框:每批行的操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个pandas DataFrame df
,我想为其每行计算一些统计信息.
I have a pandas DataFrame df
for which I want to compute some statistics per batch of rows.
例如,假设我有一个batch_size = 200000
.
For example, let's say that I have a batch_size = 200000
.
对于每批batch_size
行,我想为我的DataFrame的列ID
提供唯一值的数量.
For each batch of batch_size
rows I would like to have the number of unique values for a column ID
of my DataFrame.
我该怎么做?
以下是我想要的示例:
print(df)
>>
+-------+
| ID|
+-------+
| 1|
| 1|
| 2|
| 2|
| 2|
| 3|
| 3|
| 3|
| 3|
+-------+
batch_size = 3
my_new_function(df,batch_size)
>>
For batch 1 (0 to 2) :
2 unique values
1 appears 2 times
2 appears 1 time
For batch 2 (3 to 5) :
2 unique values
2 appears 2 times
3 appears 1 time
For batch 3 (6 to 8)
1 unique values
3 appears 3 times
注意:输出当然可以是一个简单的DataFrame
Note : The output can of course be a simple DataFrame
推荐答案
See here for splitting the dataframe. After that I would do:
from collections import Counter
Counter(batch_df['ID'].tolist())
这篇关于Pandas数据框:每批行的操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文