如何计算一个数据帧中一列的连续字符串值与另一列的列值分组? [英] How to count consecutive string values of one column grouped by column values of another in a dataframe?

查看:77
本文介绍了如何计算一个数据帧中一列的连续字符串值与另一列的列值分组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框:


|Levels|Labels|Confidence|
|----------------------------
|0.    | Hands |  0.8
|0     |Leg    |  0.7    
|0     |Eye.   | 0.9
|1     |Ear    |0.9
|1     |Eye.   |0.8
|2     |Hands  |0.9
|2     |Eye.   |0.8
|3.    |Eye.   |0.8
:
:
: 

我想检查我的水平(0,1,2,3,4,5 ..)中是否连续出现了任何标签,以及有多少连续水平(每个身体部位的连续水平数)).这是我的示例数据集,您可以看到标签"Eye"连续出现4个级别,手"用于1..etc.

I want to check if any of my labels are consecutively present in my levels (0,1,2,3,4,5..)and for how many consecutive levels (count of such consecutive levels for each of my bodyparts). Here is my example dataset, you can see that the label "Eye" is consecutively present for 4 levels, "Hands" for 1..etc.

这里有一个类似的问题:如何查找熊猫数据框中连续的相同字符串值的数量?
在那里修改此解决方案对我不起作用.我还尝试将其转换为NumPy数组,该数组也无法正常工作.

There is a similar question here : How to find the count of consecutive same string values in a pandas dataframe?
Modifying this solution there did not work for me. I also tried to convert this into a NumPy array which also did not work.

你能看看这个吗?

推荐答案

这应该有效.只需定义自定义聚合功能即可.

This should work. Just define custom aggregating function.

import pandas as pd

df = pd.DataFrame({
    'lvl': [0, 0, 0, 1, 1, 2, 2, 3, 3, 3, 4],
    'label': ['a', 'b', 'c', 'a', 'b', 'a', 'c', 'a', 'b', 'c', 'c'],
    'confidence': [0.1, 0.5, 0.3, 0.6, 0.2, 0.4, 0.7, 0.8, 0.5, 0.2, 0.8]
})


agg_func = {
    'lvl': [('length', lambda x: x.ne((x+1).shift()).cumsum().value_counts().max())]
}

result = df.groupby('label').agg(agg_func)
result.columns = result.columns.droplevel(0)

print(result)

       length
label        
a           4
b           2
c           3

这篇关于如何计算一个数据帧中一列的连续字符串值与另一列的列值分组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆