如果包含一个空格, pandas 将名称列拆分为名字和姓氏 [英] Pandas split name column into first and last name if contains one space

查看:93
本文介绍了如果包含一个空格, pandas 将名称列拆分为名字和姓氏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个包含如下名称的 Pandas DataFrame:

Let's say I have a pandas DataFrame containing names like so:

name_df = pd.DataFrame({'name':['Jack Fine','Kim Q. Danger','Jane Smith', 'Juan de la Cruz']})

    name
0   Jack Fine
1   Kim Q. Danger
2   Jane Smith
3   Juan de la Cruz

并且我想将 name 列拆分为 first_namelast_name 如果名称中有一个空格.否则,我希望将全名放入 first_name.

and I want to split the name column into first_name and last_name IF there is one space in the name. Otherwise I want the full name to be shoved into first_name.

所以最终的 DataFrame 应该是这样的:

So the final DataFrame should look like:

  first_name     last_name
0 Jack           Fine
1 Kim Q. Danger
2 Jane           Smith
3 Juan de la Cruz

我尝试通过首先应用以下函数来返回可以拆分为名字和姓氏的名称来实现此目的:

I've tried to accomplish this by first applying the following function to return names that can be split into first and last name:

def validate_single_space_name(name: str) -> str:
    pattern = re.compile(r'^.*( ){1}.*$')
    match_obj = re.match(pattern, name)
    if match_obj:
        return name
    else:
        return None

然而,将这个函数应用到我原来的 name_df 上,会导致一个空的 DataFrame,而不是一个由可以拆分的名称和 None 填充的数据帧.

However applying this function to my original name_df, leads to an empty DataFrame, not one populated by names that can be split and Nones.

如果能帮助我使用当前的工作方法,或者使用不同方法的解决方案,将不胜感激!

Help getting my current approach to work, or solutions invovling a different approach would be appreciated!

推荐答案

可以使用 str.split 对字符串进行拆分,然后使用 str.len 并将其用作布尔掩码以仅分配具有拆分的最后一个组件的那些行:

You can use str.split to split the strings, then test the number of splits using str.len and use this as a boolean mask to assign just those rows with the last component of the split:

In [33]:
df.loc[df['name'].str.split().str.len() == 2, 'last name'] = df['name'].str.split().str[-1]
df

Out[33]:
              name last name
0        Jack Fine      Fine
1    Kim Q. Danger       NaN
2       Jane Smith     Smith
3  Juan de la Cruz       NaN

编辑

您可以使用参数 expand=True 调用 split 这只会填充名称长度恰​​好为 2 个名称的位置:

You can call split with param expand=True this will only populate where the name lengths are exactly 2 names:

In [16]:
name_df[['first_name','last_name']] = name_df['name'].loc[name_df['name'].str.split().str.len() == 2].str.split(expand=True)
name_df

Out[16]:
              name first_name last_name
0        Jack Fine       Jack      Fine
1    Kim Q. Danger        NaN       NaN
2       Jane Smith       Jane     Smith
3  Juan de la Cruz        NaN       NaN

然后您可以使用 fillna 替换缺少的名字:

You can then replace the missing first names using fillna:

In [17]:
name_df['first_name'].fillna(name_df['name'],inplace=True)
name_df
​
Out[17]:
              name       first_name last_name
0        Jack Fine             Jack      Fine
1    Kim Q. Danger    Kim Q. Danger       NaN
2       Jane Smith             Jane     Smith
3  Juan de la Cruz  Juan de la Cruz       NaN

这篇关于如果包含一个空格, pandas 将名称列拆分为名字和姓氏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆