如果另一列中的字符串包含列表中的内容,则更新一列中的值 [英] Update Value in one column, if string in other column contains something in list

查看:84
本文介绍了如果另一列中的字符串包含列表中的内容,则更新一列中的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  id name             gender
0 13 John Smith       0
1 46 Jim Jeffries     2
2 75 Jennifer Johnson 0
3 37 Sam Adams        0
4 24 John Cleese      0
5 17 Taika Waititi    0

我有df中有很多人的名字和性别,取自电影演员的数据库。为性别分配了1(女性),2(男性)或0(未列出)。我想梳理一下,并按性别随意地假定性别。名称将存储在列表中,并手动填写。也许我通过ID识别出一个具有性别非特定名称的人,然后找出自己是男性还是女性,我也想注入这一点:

I have a lot of people's names and genders in a df, taken from a film actors' db. Genders were assigned a 1 (female), 2 (male), or 0 (not listed). I'd like to comb through and callously assume genders by name. Names would be stored in a list, and filled out manually. Perhaps I spot somebody with a gender-nonspecific name by ID and find out myself if they are male/female, I'd like to inject that as well:

m_names = ['John', ...]
f_names = ['Jennifer', ...]
m_ids   = ['37', ...]
f_ids   = ['', ...]

我可以很好地控制for循环和np.where,但我不知道如何逐行通过此df。

I've got fine control of for loops and np.where, but I can't figure out how to get through this df, row by row.

如果要使用以上内容,我想返回看起来像:

If what's above were to be used, what I want to return would look like:

for index, row in df.iterrows():
  if row['gender'] == 0:
    if   row['name'].str.contains(' |'.join(f_names)) or row['id'].str.contains('|'.join(f_ids)):
      return 1
    elif row['name'].str.contains(' |'.join(m_names)) or row['id'].str.contains('|'.join(m_ids)):
      return 2
print(df)

  id name             gender
0 13 John Smith       2
1 46 Jim Jeffries     2
2 75 Jennifer Johnson 1
3 37 Sam Adams        2
4 24 John Cleese      2
5 17 Taika Waititi    0

请注意名称条件中'|'之前的空格,以避免抓住姓氏的任何部分。

Note the space before '|' in the condition for names, to avoid grabbing any parts of last names.

在这一点上,我遇到了如何我已经格式化了我的if语句。 Python不喜欢我的格式,并说我的返回是外部函数。如果我将其更改为

At this point, I'm running into a wall with how I've formatted my if statements. Python doesn't like my formatting, and says my 'return's are 'outside function'. If I change these to

row['gender'] = #

我在使用unicode以及使用'str'和'contains'时遇到问题。

I run into issues with unicode and my usage of 'str' and 'contains'.

推荐答案

似乎您需要 np.select 并且没有for循环

Seems like you need np.select and no for loops

df['gender'] = np.select([df.name.str.contains(" |".join(m_names)),
                          df.name.str.contains(" |".join(f_names))],
                         [2, 1], 
                         default=3)

这篇关于如果另一列中的字符串包含列表中的内容,则更新一列中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆