Python pandas:根据另一列的值更新行 [英] Python pandas: Updating row based on value from another column

查看:624
本文介绍了Python pandas:根据另一列的值更新行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框df,例如:

I have a pandas dataframe, df, like:

name   | grade | grade_type
---------------------------
sarah  | B     | letter  
alice  | A     | letter
eliza  | C     | letter
beth   | 76    | numeral
jones  | 90    | numeral

df中的所有值都是字符串,包括数字.我想根据检查grade_type列,将grade数值转换为字母,以获得:

All values in df are strings, including the numbers. I want to convert the grade numeric values into letters, based on checking the grade_type column, to get:

name   | grade | grade_type
---------------------------
sarah  | B     | letter  
alice  | A     | letter
eliza  | C     | letter
beth   | B     | numeral
jones  | A     | numeral

为完整起见,数字到字母的等级转换为:

For completeness, the numeral-to-letter grade conversions are:

A: grade > 80
B: 70 < grade <= 80
C: 60 < grade <= 70

为什么这行不通?

for index, row in df.iterrows():
  if row.grade_type == "numeral":
    grade_val = int(row.grade.values[0])
    if grade_val > 80:
      row.grade = "A" # This assignment doesn't update row.grade!
    elif...

另一种方法是使用df.apply(...lambda:...),但是我不太确定如何实现它,因为在决定是否更新grade值之前,我们必须检查grade_type列.

The alternative is using df.apply(...lambda:...), but I'm not too sure how to pull that off, since we have to check the grade_type column before deciding whether or not to update the grade value.

推荐答案

DataFrame不更新的原因是因为

The reason that your DataFrame doesn't update is because rows returned from iterrows(): are copies. And you're working on that copy.

您可以使用从返回的index迭代并直接操作DataFrame:

You can use the index returned from iterrows and manipulate DataFrame directly:

for index, row in df.iterrows():
    grade_val = int(row.grade.values[0])
    if grade_val > 80:
        df.loc[index, 'grade'] = 'A'
    ...

或者如您所说,您可以使用 df. apply(),并向其传递一个自定义函数:

Or as you said you can use df.apply(), and pass it a custom function:

def get_grades(x):
    if x['grade_type'] == 'letter':
        return(x['grade_val']) 
    if x['grade_val'] > 80:
        return "A"
    ...


df['grade'] = df.apply(lambda x: get_grades(x), axis=1)

您还可以在lambda中使用if else来检查x['grade_type']是否为数字,如下所示,使用看起来更容易阅读的数字.

You can also use if else in your lambda to check if x['grade_type'] is numeric as follows, use the one that looks easier to read.

def get_grades(grade_val):
    if grade_val > 80:
        return "A"
    ...

df['grade'] = df.apply(lambda x: get_grades(x['grade']) 
                       if x['grade_type'] == 'numeral' else x['grade'], axis=1)

这篇关于Python pandas:根据另一列的值更新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆