从另一列pandas df分配值的有效方法 [英] Efficient way to assign values from another column pandas df
问题描述
我正在尝试创建一个更高效的脚本,该脚本基于另一列中的值创建新的column
.下面的脚本执行此操作,但是我一次只能选择一个string
.我想对所有单个值执行此操作.
I'm trying to create a more efficient script that creates a new column
based off values in another column. The script below performs this but I can only select one string
at a time. I'd like to do this on all individual values.
对于下面的df
,我目前正在Location
中的每个单独的string
上运行脚本.但是,我想在所有unique
strings
上运行脚本.
For the df
below I'm currently running the script on each individual string
in Location
. However, I want to run the script on all unique
strings
.
有关如何分配新列的说明:Location
中的每个string
都为Day
中的前3个唯一项获取值.因此,对于Location
中的每个值,都会为Day
中的前三个唯一值分配一个新的字符串.
Description on how the new column is assigned: Each individual string
in Location
gets a value for the first 3 unique items in Day
. So, for each value in Location
, a new string gets assigned to the first 3 unique values in Day
.
import pandas as pd
import numpy as np
d = ({
'Day' : ['Mon','Tues','Wed','Wed','Thurs','Thurs','Fri','Mon','Sat','Fri','Sun'],
'Location' : ['Home','Home','Away','Home','Away','Home','Home','Home','Home','Away','Home'],
})
df = pd.DataFrame(data=d)
#Select value
mask = df['Location'] == 'Home'
df1 = df[mask].drop_duplicates('Day')
d = dict(zip(df1['Day'], np.arange(len(df1)) // 3 + 1))
df.loc[mask, 'Assign'] = df.loc[mask, 'Day'].map(d)
此刻,我正在选择['Location']
中的每个值,例如mask = df['Location'] == 'Home'
.
At the moment I'm selecting each value in ['Location']
, e.g. mask = df['Location'] == 'Home'
.
我想对所有值进行处理.例如mask = df['Location'] == All unique values
I want to do it on all values. e.g. mask = df['Location'] == All unique values
预期输出:
Day Location Assign
0 Mon Home C1
1 Tues Home C1
2 Wed Away C2
3 Wed Home C1
4 Thurs Away C2
5 Thurs Home C3
6 Fri Home C3
7 Mon Home C1
8 Sat Home C3
9 Fri Away C2
10 Sun Home C4
推荐答案
您可以使用:
def f(x):
#get unique days
u = x['Day'].unique()
#mapping dictionary
d = dict(zip(u, np.arange(len(u)) // 3 + 1))
x['new'] = x['Day'].map(d)
return x
df = df.groupby('Location', sort=False).apply(f)
#add Location column
s = df['new'].astype(str) + df['Location']
#encoding by factorize
df['new'] = pd.Series(pd.factorize(s)[0] + 1).map(str).radd('C')
print (df)
Day Location new
0 Mon Home C1
1 Tues Home C1
2 Wed Away C2
3 Wed Home C1
4 Thurs Away C2
5 Thurs Home C3
6 Fri Home C3
7 Mon Home C1
8 Sat Home C3
9 Fri Away C2
10 Sun Home C4
这篇关于从另一列pandas df分配值的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!