从另一列pandas df分配值的有效方法 [英] Efficient way to assign values from another column pandas df

查看:73
本文介绍了从另一列pandas df分配值的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个更高效的脚本,该脚本基于另一列中的值创建新的column.下面的脚本执行此操作,但是我一次只能选择一个string.我想对所有单个值执行此操作.

I'm trying to create a more efficient script that creates a new column based off values in another column. The script below performs this but I can only select one string at a time. I'd like to do this on all individual values.

对于下面的df,我目前正在Location中的每个单独的string上运行脚本.但是,我想在所有unique strings上运行脚本.

For the df below I'm currently running the script on each individual string in Location. However, I want to run the script on all unique strings.

有关如何分配新列的说明:Location中的每个string都为Day中的前3个唯一项获取值.因此,对于Location中的每个值,都会为Day中的前三个唯一值分配一个新的字符串.

Description on how the new column is assigned: Each individual string in Location gets a value for the first 3 unique items in Day. So, for each value in Location, a new string gets assigned to the first 3 unique values in Day.

import pandas as pd
import numpy as np

d = ({
    'Day' : ['Mon','Tues','Wed','Wed','Thurs','Thurs','Fri','Mon','Sat','Fri','Sun'],                 
    'Location' : ['Home','Home','Away','Home','Away','Home','Home','Home','Home','Away','Home'],        
    })

df = pd.DataFrame(data=d)

#Select value
mask = df['Location'] == 'Home'
df1 = df[mask].drop_duplicates('Day')
d = dict(zip(df1['Day'], np.arange(len(df1)) // 3 + 1))

df.loc[mask, 'Assign'] = df.loc[mask, 'Day'].map(d)

此刻,我正在选择['Location']中的每个值,例如mask = df['Location'] == 'Home'.

At the moment I'm selecting each value in ['Location'], e.g. mask = df['Location'] == 'Home'.

我想对所有值进行处理.例如mask = df['Location'] == All unique values

I want to do it on all values. e.g. mask = df['Location'] == All unique values

预期输出:

      Day Location Assign
0     Mon     Home     C1
1    Tues     Home     C1
2     Wed     Away     C2
3     Wed     Home     C1
4   Thurs     Away     C2
5   Thurs     Home     C3
6     Fri     Home     C3
7     Mon     Home     C1
8     Sat     Home     C3
9     Fri     Away     C2
10    Sun     Home     C4

推荐答案

您可以使用:

def f(x):
    #get unique days
    u = x['Day'].unique()
    #mapping dictionary
    d = dict(zip(u, np.arange(len(u)) // 3 + 1))
    x['new'] = x['Day'].map(d)
    return x

df = df.groupby('Location', sort=False).apply(f)
#add Location column
s = df['new'].astype(str) + df['Location']
#encoding by factorize
df['new'] = pd.Series(pd.factorize(s)[0] + 1).map(str).radd('C')
print (df)
      Day Location new
0     Mon     Home  C1
1    Tues     Home  C1
2     Wed     Away  C2
3     Wed     Home  C1
4   Thurs     Away  C2
5   Thurs     Home  C3
6     Fri     Home  C3
7     Mon     Home  C1
8     Sat     Home  C3
9     Fri     Away  C2
10    Sun     Home  C4

这篇关于从另一列pandas df分配值的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆