如果另一列中的字符串包含列表中的内容,则更新一列中的值 [英] Update Value in one column, if string in other column contains something in list
问题描述
id name gender
0 13 John Smith 0
1 46 Jim Jeffries 2
2 75 Jennifer Johnson 0
3 37 Sam Adams 0
4 24 John Cleese 0
5 17 Taika Waititi 0
我有df中有很多人的名字和性别,取自电影演员的数据库。为性别分配了1(女性),2(男性)或0(未列出)。我想梳理一下,并按性别随意地假定性别。名称将存储在列表中,并手动填写。也许我通过ID识别出一个具有性别非特定名称的人,然后找出自己是男性还是女性,我也想注入这一点:
I have a lot of people's names and genders in a df, taken from a film actors' db. Genders were assigned a 1 (female), 2 (male), or 0 (not listed). I'd like to comb through and callously assume genders by name. Names would be stored in a list, and filled out manually. Perhaps I spot somebody with a gender-nonspecific name by ID and find out myself if they are male/female, I'd like to inject that as well:
m_names = ['John', ...]
f_names = ['Jennifer', ...]
m_ids = ['37', ...]
f_ids = ['', ...]
我可以很好地控制for循环和np.where,但我不知道如何逐行通过此df。
I've got fine control of for loops and np.where, but I can't figure out how to get through this df, row by row.
如果要使用以上内容,我想返回看起来像:
If what's above were to be used, what I want to return would look like:
for index, row in df.iterrows():
if row['gender'] == 0:
if row['name'].str.contains(' |'.join(f_names)) or row['id'].str.contains('|'.join(f_ids)):
return 1
elif row['name'].str.contains(' |'.join(m_names)) or row['id'].str.contains('|'.join(m_ids)):
return 2
print(df)
id name gender
0 13 John Smith 2
1 46 Jim Jeffries 2
2 75 Jennifer Johnson 1
3 37 Sam Adams 2
4 24 John Cleese 2
5 17 Taika Waititi 0
请注意名称条件中'|'之前的空格,以避免抓住姓氏的任何部分。
Note the space before '|' in the condition for names, to avoid grabbing any parts of last names.
在这一点上,我遇到了如何我已经格式化了我的if语句。 Python不喜欢我的格式,并说我的返回是外部函数。如果我将其更改为
At this point, I'm running into a wall with how I've formatted my if statements. Python doesn't like my formatting, and says my 'return's are 'outside function'. If I change these to
row['gender'] = #
我在使用unicode以及使用'str'和'contains'时遇到问题。
I run into issues with unicode and my usage of 'str' and 'contains'.
推荐答案
似乎您需要 np.select
并且没有for循环
Seems like you need np.select
and no for loops
df['gender'] = np.select([df.name.str.contains(" |".join(m_names)),
df.name.str.contains(" |".join(f_names))],
[2, 1],
default=3)
这篇关于如果另一列中的字符串包含列表中的内容,则更新一列中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!