pandas 根据“名称"列创建“外国ID"列 [英] Pandas create Foreign ID column based on Name column
问题描述
例如,我有一个简单的数据框:
I have a simple dataframe like this for example:
df = pd.DataFrame({'Name': ['John Doe', 'Jane Smith', 'John Doe', 'Jane Smith','Jack Dawson','John Doe']})
df:
Name
0 John Doe
1 Jane Smith
2 John Doe
3 Jane Smith
4 Jack Dawson
5 John Doe
我想添加一列['foreign_key'],该列为每个唯一名称分配一个唯一ID(但具有相同名称的行应具有相同的'foreign_key'.因此最终输出如下所示:
I want to add a column ['foreign_key'] that assigns a unique ID to each unique name (but rows with the same name should have the same 'foreign_key'. So the final output looks like:
df:
Name Foreign_Key
0 John Doe foreignkey1
1 Jane Smith foreignkey2
2 John Doe foreignkey1
3 Jane Smith foreignkey2
4 Jack Dawson foreignkey3
5 John Doe foreignkey1
我正在尝试将 groupby 与自定义函数一起使用被申请;被应用. 所以我的第一步是:
I'm trying to use groupby with a custom function that is applied. So my first step is:
name_groupby = df.groupby('Name')
这就是拆分,接下来是应用和合并.像这个例子一样,文档中似乎没有任何内容,我不确定从这里开始.
So that's the splitting, and next comes the apply and combine. There doesn't appear to be anything in the docs like this example and I'm unsure where to go from here.
我开始应用的自定义函数如下:
The custom function I started to apply looks like this:
def make_foreign_key(groupby_df):
return groupby_df['Foreign_Key'] = 'foreign_key' + num
任何帮助将不胜感激!
推荐答案
您可以这样做:
pd.merge(
df,
pd.DataFrame(df.Name.unique(), columns=['Name']).reset_index().rename(columns={'index': 'Foreign_Key'}),
on='Name'
)
Name Foreign_Key
0 John Doe 0
1 John Doe 0
2 Jane Smith 1
3 Jane Smith 1
这篇关于 pandas 根据“名称"列创建“外国ID"列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!