使用for循环创建其他列,这些列可根据多种条件计算百分比 [英] Using a for loop to create additional columns that calculate percentages based on multiple conditions

查看:129
本文介绍了使用for循环创建其他列,这些列可根据多种条件计算百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含调查数据的数据框。调查数据评估了多个因素,例如教练,多元化,敬业度等。还有其他几列捕获人口统计数据(例如,年龄,部门等)。我想基于包含评级的列添加列。

I have a dataframe with survey data. The survey data assesses multiple factors like coaching, diversity, engagement etc. There are also several other columns which capture demographic data (e.g., age, department etc). I would like to add columns based on the columns that contain the ratings.

添加列的目的是为了a)提供良好的响应计数,b)获得百分比有利的答复(没有有利的答复/该因素中没有项目)和c)以在因素级别获得有利的答复的百分比(条件是,如果缺少任何项目的答复,则该因素将为NULL级别)

The purpose of adding the columns is to a) provide a count of Favourable responses, b) to get the percentage of Favourable responses (no of favourable responses / no of items in that factor) and c) to get the percentage of Favourable responses at the Factor level (with the condition that if there are missing reponses for any item, it would be NULL at the Factor level)

下表显示了所需的输出,其中仅包含教练项目。该表应包含其他评分列,并应适用于诸如多样性,领导力,敬业度等因素。

The table below shows the desired output where only Coaching items are factor are included. The table should contain other rating columns and should apply for Factors like Diversity, Leadership, Engagement etc.

Coach_q1     Coach_q2    Coach_q3      coach_fav_count  coach_fav_perc  coach_agg_perc 
Favourable   Neutral     Favourable    2                66%            66%        
Favourable   Favourable  Fabourable    3                100%           100%
NaN          Favourable  NaN           1                33%            NaN
Favourable   NaN         Favourable    2                66%            NaN         

以下代码可用于获取_favcount列和_fav%列。 ratingcollist用于仅将转换应用于具有这些前缀的列。但是,无法获得因子水平列,该列的目的是获得整个因子的有利回答的百分比-仅在针对该特定因子回答了所有问题的情况下(即,如果在特定范围内的任何项目中都缺少回答)因素,那么该因素将产生NaN值。)

The following code works in getting the _favcount columns and the _fav% columns. The ratingcollist is used to only apply the transformations on columns with those prefixes. However, am unable to get the factor level column which aims to get the percentage of favourable responses for the entire factor - ONLY if all questions were answered for that particular factor (i.e., if there were missing responses in any of the items within a particular factor, then the factor would yield a NaN value).

感谢我能获得的任何形式的帮助,谢谢。

Appreciate any form of help i can get, thank you.

ratingcollist = ['Coach_','Diversity_','Leadership_','Engagement_']


#create a for loop to get all the columns that match the column list keyword
for rat in ratingcollist:
    cols = df.filter(like=rat).columns

#create 2 new columns for each factor, one for count of Favourable responses and one for percentage of Favourable responses
    if len(cols) > 0:
        df[f'{rat.lower()}fav_count'] = (df[cols] == 'Favourable').sum(axis=1)
        df[f'{rat.lower()}fav_perc'] = (df[f'{rat.lower()}fav_count'] / len(cols)) * 100


推荐答案

DataFrame.notna DataFrame.all ,并且仅用于 DataFrame.loc

df = pd.DataFrame({'Coach_q1': ['Favourable', 'Favourable', np.nan, 'Favourable'], 'Coach_q2': ['Neutral', 'Favourable', 'Favourable', np.nan], 'Coach_q3': ['Favourable', 'Favourable', np.nan, 'Favourable']})





ratingcollist = ['Coach_','Diversity_','Leadership_','Engagement_']


#create a for loop to get all the columns that match the column list keyword
for rat in ratingcollist:
    cols = df.filter(like=rat).columns
    
    mask = df[cols].notna().all(axis=1)

#create 2 new columns for each factor, one for count 
#of Favourable responses and one for percentage of Favourable responses
    if len(cols) > 0:
        df[f'{rat.lower()}fav_count'] = (df[cols] == 'Favourable').sum(axis=1)
        df[f'{rat.lower()}fav_perc'] = (df.loc[mask, f'{rat.lower()}fav_count'] / len(cols)) * 100

print (df)
     Coach_q1    Coach_q2    Coach_q3  coach_fav_count  coach_fav_perc
0  Favourable     Neutral  Favourable                2       66.666667
1  Favourable  Favourable  Favourable                3      100.000000
2         NaN  Favourable         NaN                1             NaN
3  Favourable         NaN  Favourable                2             NaN

这篇关于使用for循环创建其他列,这些列可根据多种条件计算百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆