pandas :基于在其他列上应用字符串条件创建一个列 [英] Pandas: Create a column based on applying string conditions on other column

查看:75
本文介绍了 pandas :基于在其他列上应用字符串条件创建一个列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框df,如下所示:

I have a dataframe df as follows:

    KPI             Tata    JSW 
Gross Margin %      0.582   0.476   
EBITDA Margin %     0.191   0.23    
EBIT Margin %       0.145   0.183   
SG&A/Revenue        0.141   0.03    
COGS/Revenue        0.418   0.524   
CapE/Revenue        0.0577  0.1204      
ROA                 0.064   0.093   
ROE                 0.138   0.243       
Revenue/Employee $K 290.9   934.4   
Inventory Turnover  2.2     3.27    
AR Turnover         13.02   14.29   
Tot Asset Turnover  0.68    0.74    
Current Ratio       0.9     0.8 
Quick Ratio         0.3     0.4 

我正在尝试根据以下条件添加一列,例如scope:

I am trying to add a column, say, scope based on the following criterion:

if df[df['KPI'].str.contains('Margin|Revenue|ROE|ROA')].shape[0] > 0:
  z = 'Max'
elif df[df['KPI'].str.contains('Quick|Current|Turnover')].shape[0] > 0:
  z = 'Min'

换句话说,如果字段KPI包含诸如RevenueMargin之类的任何单词,则列scope应该采用Max否则为Min.现在KPI == COGS/RevenueKPI == CapEx/Revenue中有一个例外.在这种情况下,即使存在字符串Revenuescope也应采用Min.

In other words, If the field KPI contains any word like Revenue or Margin then the column scope should take Max else Min. Now there is an exception in KPI == COGS/Revenue or KPI == CapEx/Revenue. In this case the scope should take Min despite the string Revenue is present.

因此,结果df应该如下所示:

So the resultant df should look like below:

为了达到相同的目的,我正在尝试apply字段KPI上的函数.

In order to achieve the same I am trying to apply a function on the field KPI.

def scope_superlative(col_name):
  df_test = df[df[col_name].str.contains('Margin| Revenue|ROA|ROE')]
  if df_test.shape[0] > 0:
    z = 'Max'
  else:
    df_test = df[df[col_name].str.contains('/Revenue|Current|Quick|Turnover')] ##<-- I want to check if string 'Revenue' is in denominator.##
    if df_test.shape[0] > 0:
      z='Min'
  return z
##Applying this function##
df['scope'] = df.KPI.apply(lambda x : scope_superlative(x))

上面的代码正在将Error生成为KeyError: 'Gross Margin %

The above code is generating an Error as KeyError: 'Gross Margin %

如果我使用df['scope']=df.apply(scope_superlative('KPI'), axis=1),则会收到错误消息,显示为AttributeError: 'DataFrame' object has no attribute 'Max'.

If I use df['scope']=df.apply(scope_superlative('KPI'), axis=1) I get an Error as AttributeError: 'DataFrame' object has no attribute 'Max'.

有人可以帮忙吗?功能或应用技巧有什么问题吗?

Can anybody please help on this? Is there anything wrong in function or applying technique?

推荐答案

我认为您正在寻找这样的东西:

I think you are looking for something like this:

import pandas as pd
import re

def fn(row):
    if re.search('/Revenue|Current|Quick|Turnover', row['KPI']):
        return 'Min'
    elif re.search('Margin|Revenue|ROA|ROE', row['KPI']):
        return 'Max'

df = pd.read_csv('so.csv')

df['scope'] = df.apply (lambda row: fn(row), axis=1)
print (df)

这仅使用df.apply()函数,该函数将每一行都应用到其上并提供所提供的功能.

This simply uses df.apply() function which takes each row and applies the provided function on it.

根据给定的数据得出以下结果:

This gives following result on given data:

0        Gross Margin %    0.5820    0.4760   Max
1       EBITDA Margin %    0.1910    0.2300   Max
2         EBIT Margin %    0.1450    0.1830   Max
3          SG&A/Revenue    0.1410    0.0300   Min
4          COGS/Revenue    0.4180    0.5240   Min
5          CapE/Revenue    0.0577    0.1204   Min
6                   ROA    0.0640    0.0930   Max
7                   ROE    0.1380    0.2430   Max
8   Revenue/Employee $K  290.9000  934.4000   Max
9    Inventory Turnover    2.2000    3.2700   Min
10          AR Turnover   13.0200   14.2900   Min
11   Tot Asset Turnover    0.6800    0.7400   Min
12        Current Ratio    0.9000    0.8000   Min
13          Quick Ratio    0.3000    0.4000   Min

希望这会有所帮助!

这篇关于 pandas :基于在其他列上应用字符串条件创建一个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆