pandas 根据其他列的值创建新列/应用多列的函数,逐行 [英] pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

查看:27
本文介绍了 pandas 根据其他列的值创建新列/应用多列的函数,逐行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将我的自定义函数(它使用 if-else 阶梯)应用于这六列(ERI_HispanicERI_AmerInd_AKNatvERI_Asian, ERI_Black_Afr.Amer, ERI_HI_PacIsl, ERI_White) 在我的数据帧的每一行中.

I want to apply my custom function (it uses an if-else ladder) to these six columns (ERI_Hispanic, ERI_AmerInd_AKNatv, ERI_Asian, ERI_Black_Afr.Amer, ERI_HI_PacIsl, ERI_White) in each row of my dataframe.

我尝试了与其他问题不同的方法,但似乎仍然无法为我的问题找到正确的答案.其中的关键部分是,如果该人被视为西班牙裔,他们就不能被视为其他任何人.即使他们在另一个种族栏中有一个1",他们仍然被视为西班牙裔,而不是两个或更多种族.类似地,如果所有 ERI 列的总和大于 1,则它们被计为两个或多个种族,不能算作独特的种族(西班牙裔除外).希望这是有道理的.任何帮助将不胜感激.

I've tried different methods from other questions but still can't seem to find the right answer for my problem. The critical piece of this is that if the person is counted as Hispanic they can't be counted as anything else. Even if they have a "1" in another ethnicity column they still are counted as Hispanic not two or more races. Similarly, if the sum of all the ERI columns is greater than 1 they are counted as two or more races and can't be counted as a unique ethnicity(except for Hispanic). Hopefully this makes sense. Any help will be greatly appreciated.

这几乎就像对每一行进行 for 循环一样,如果每条记录满足一个标准,它们就会被添加到一个列表中并从原始列表中删除.

Its almost like doing a for loop through each row and if each record meets a criterion they are added to one list and eliminated from the original.

从下面的数据框中,我需要根据 SQL 中的以下规范计算一个新列:

From the dataframe below I need to calculate a new column based on the following spec in SQL:

========================== 标准 ================================

========================= CRITERIA ===============================

IF [ERI_Hispanic] = 1 THEN RETURN "Hispanic"
ELSE IF SUM([ERI_AmerInd_AKNatv] + [ERI_Asian] + [ERI_Black_Afr.Amer] + [ERI_HI_PacIsl] + [ERI_White]) > 1 THEN RETURN "Two or More"
ELSE IF [ERI_AmerInd_AKNatv] = 1 THEN RETURN "A/I AK Native"
ELSE IF [ERI_Asian] = 1 THEN RETURN "Asian"
ELSE IF [ERI_Black_Afr.Amer] = 1 THEN RETURN "Black/AA"
ELSE IF [ERI_HI_PacIsl] = 1 THEN RETURN "Haw/Pac Isl."
ELSE IF [ERI_White] = 1 THEN RETURN "White"

注释:如果西班牙裔的 ERI 标志为真 (1),则该员工被归类为西班牙裔"

Comment: If the ERI Flag for Hispanic is True (1), the employee is classified as "Hispanic"

注释:如果超过 1 个非西班牙裔 ERI 标志为真,则返回两个或更多"

Comment: If more than 1 non-Hispanic ERI Flag is true, return "Two or More"

====================== 数据帧 ===========================

====================== DATAFRAME ===========================

     lname          fname       rno_cd  eri_afr_amer    eri_asian   eri_hawaiian    eri_hispanic    eri_nat_amer    eri_white   rno_defined
0    MOST           JEFF        E       0               0           0               0               0               1           White
1    CRUISE         TOM         E       0               0           0               1               0               0           White
2    DEPP           JOHNNY              0               0           0               0               0               1           Unknown
3    DICAP          LEO                 0               0           0               0               0               1           Unknown
4    BRANDO         MARLON      E       0               0           0               0               0               0           White
5    HANKS          TOM         0                       0           0               0               0               1           Unknown
6    DENIRO         ROBERT      E       0               1           0               0               0               1           White
7    PACINO         AL          E       0               0           0               0               0               1           White
8    WILLIAMS       ROBIN       E       0               0           1               0               0               0           White
9    EASTWOOD       CLINT       E       0               0           0               0               0               1           White

推荐答案

好的,这有两个步骤 - 首先是编写一个执行您想要的翻译的函数 - 我已经根据您的伪代码将示例放在一起:

OK, two steps to this - first is to write a function that does the translation you want - I've put an example together based on your pseudo-code:

def label_race (row):
   if row['eri_hispanic'] == 1 :
      return 'Hispanic'
   if row['eri_afr_amer'] + row['eri_asian'] + row['eri_hawaiian'] + row['eri_nat_amer'] + row['eri_white'] > 1 :
      return 'Two Or More'
   if row['eri_nat_amer'] == 1 :
      return 'A/I AK Native'
   if row['eri_asian'] == 1:
      return 'Asian'
   if row['eri_afr_amer']  == 1:
      return 'Black/AA'
   if row['eri_hawaiian'] == 1:
      return 'Haw/Pac Isl.'
   if row['eri_white'] == 1:
      return 'White'
   return 'Other'

您可能想仔细研究一下,但它似乎可以解决问题 - 请注意,进入函数的参数被认为是标记为行"的 Series 对象.

You may want to go over this, but it seems to do the trick - notice that the parameter going into the function is considered to be a Series object labelled "row".

接下来,使用pandas中的apply函数来应用函数——例如

Next, use the apply function in pandas to apply the function - e.g.

df.apply (lambda row: label_race(row), axis=1)

注意 axis=1 说明符,这意味着应用程序是在行级而不是列级完成的.结果在这里:

Note the axis=1 specifier, that means that the application is done at a row, rather than a column level. The results are here:

0           White
1        Hispanic
2           White
3           White
4           Other
5           White
6     Two Or More
7           White
8    Haw/Pac Isl.
9           White

如果您对这些结果感到满意,请再次运行,将结果保存到原始数据框中的新列中.

If you're happy with those results, then run it again, saving the results into a new column in your original dataframe.

df['race_label'] = df.apply (lambda row: label_race(row), axis=1)

结果数据框如下所示(向右滚动以查看新列):

The resultant dataframe looks like this (scroll to the right to see the new column):

      lname   fname rno_cd  eri_afr_amer  eri_asian  eri_hawaiian   eri_hispanic  eri_nat_amer  eri_white rno_defined    race_label
0      MOST    JEFF      E             0          0             0              0             0          1       White         White
1    CRUISE     TOM      E             0          0             0              1             0          0       White      Hispanic
2      DEPP  JOHNNY    NaN             0          0             0              0             0          1     Unknown         White
3     DICAP     LEO    NaN             0          0             0              0             0          1     Unknown         White
4    BRANDO  MARLON      E             0          0             0              0             0          0       White         Other
5     HANKS     TOM    NaN             0          0             0              0             0          1     Unknown         White
6    DENIRO  ROBERT      E             0          1             0              0             0          1       White   Two Or More
7    PACINO      AL      E             0          0             0              0             0          1       White         White
8  WILLIAMS   ROBIN      E             0          0             1              0             0          0       White  Haw/Pac Isl.
9  EASTWOOD   CLINT      E             0          0             0              0             0          1       White         White

这篇关于 pandas 根据其他列的值创建新列/应用多列的函数,逐行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆