在pandas df中重新分配列值 [英] Re-assign column values in a pandas df
问题描述
此问题与花名册或人员配置有关.我正在尝试将各种工作分配给个人(员工).使用下面的df
This question is related to rostering or staffing. I'm trying to assign various jobs to individuals (employees). Using the df
below,
`[Person]` = Individuals (employees)
`[Area]` and `[Place]` = unique jobs
`[On]` = How many unique jobs are occurring at each point in time
因此,[Area]
和[Place]
一起将构成不同工作的unique
值.这些值将分配给个人,其总体目标是使用尽可能少的个人.对于任何一个人来说,最唯一的值assigned
是3.[On]
显示[Place]
和[Area]
的当前unique
值的数量.因此,这为我需要多少个人提供了具体的指导.例如
So [Area]
and [Place]
together will make up unique
values that are different jobs. These values will be assigned to individuals with the overall aim to use the least amount of individuals possible. The most unique values assigned
to any one individual is 3. [On]
displays how many current unique
values for [Place]
and [Area]
are occurring. So this provides a concrete guide on how many individuals I need. For example,
1-3 unique values occurring = 1 individual
4-6 unique values occurring = 2 individuals
7-9 unique values occurring = 3 individuals etc
问题:
[Area]
和[Place]
中的unique
值的数量大于3的情况给我造成了麻烦.我无法执行groupby
,其中我assign
第一个3 unique values
至individual 1
以及接下来的3个unique
值至individual 2
等.我想将[Area]
和[Area]
.因此,将[Area]
中与assign
相同的值用于一个个体(最多3个).然后,如果有剩余个值(<3),则应将它们合并为一组(如果可能),为3.
Question:
Where the amount of unique
values in [Area]
and [Place]
is greater than 3 is causing me trouble. I can't do a groupby
where I assign
the first 3 unique values
to individual 1
and the next 3 unique
values to individual 2
etc. I want to group unique values in [Area]
and [Place]
by [Area]
. So look to assign
same values in [Area]
to an individual (up to 3). Then, if there are leftover values (<3), they should be combined to make a group of 3, where possible.
我设想此工作的方式是:hour
展望未来.对于每个新的row
值,script
应该查看多少个[On]
值(这表明需要多少个人).如果unique
值> 3,则它们应为assigned
乘以grouping
,与[Area]
中的值相同.如果有剩余个值,则应将它们组合起来以组成3个一组.
The way I envisage this working is: see into the future by an hour
. For each new row
of values the script
should see how many values will be [On]
(this provides an indication of how many total individuals are required). Where unique
values are >3, they should be assigned
by grouping
the same value in [Area]
. If there are leftover values they should be combined anyhow to make up to a group of 3.
将其逐步进行:
1)使用[On]
Column
来确定需要多少人,方法是展望未来寻找hour
1) Use the [On]
Column
to determine how many individuals are required by looking into the future for an hour
2)如果出现超过3个unique
值,请在[Area]
中首先分配相同的值.
2) Where there are more than 3 unique
values occurring assign the identical values in [Area]
first.
3)如果有任何剩余值,则无论如何都希望合并.
3) If there are any leftover values then look to combine anyway possible.
对于下面的df
,对于[Place]
和[Area]
以及hour
,有9个unique
值出现.因此,我们应该有3个人assigned
.当unique
值> 3时,应由[Area]
分配,并查看是否出现相同的值. 剩余值应与其他少于3个unique
值的个体组合.
For the df
below, there are 9 unique
values occurring for [Place]
and [Area]
with an hour
. So we should have 3 individuals assigned
. When unique
values >3 it should be assigned by [Area]
and seeing if the same value occurs. The leftover values should be combined with other individuals that have less than 3 unique
values.
import pandas as pd
import numpy as np
d = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],
'Place' : ['House 1','House 2','House 3','House 4','House 5','House 1','House 2','House 3','House 2'],
'Area' : ['A','B','C','D','E','D','E','F','G'],
'On' : ['1','2','3','4','5','6','7','8','9'],
'Person' : ['Person 1','Person 2','Person 3','Person 4','Person 5','Person 4','Person 5','Person 6','Person 7'],
})
df = pd.DataFrame(data=d)
这是我的尝试:
def reduce_df(df):
values = df['Area'] + df['Place']
df1 = df.loc[~values.duplicated(),:] # ignore duplicate values for this part..
person_count = df1.groupby('Person')['Person'].agg('count')
leftover_count = person_count[person_count < 3] # the 'leftovers'
# try merging pairs together
nleft = leftover_count.shape[0]
to_try = np.arange(nleft - 1)
to_merge = (leftover_count.values[to_try] +
leftover_count.values[to_try + 1]) <= 3
to_merge[1:] = to_merge[1:] & ~to_merge[:-1]
to_merge = to_try[to_merge]
merge_dict = dict(zip(leftover_count.index.values[to_merge+1],
leftover_count.index.values[to_merge]))
def change_person(p):
if p in merge_dict.keys():
return merge_dict[p]
return p
reduced_df = df.copy()
# update df with the merges you found
reduced_df['Person'] = reduced_df['Person'].apply(change_person)
return reduced_df
df1 = (reduce_df(reduce_df(df)))
这是输出:
Time Place Area On Person
0 8:03:00 House 1 A 1 Person 1
1 8:17:00 House 2 B 2 Person 1
2 8:20:00 House 3 C 3 Person 1
3 8:28:00 House 4 D 4 Person 4
4 8:35:00 House 5 E 5 Person 5
5 8:40:00 House 1 D 6 Person 4
6 8:42:00 House 2 E 7 Person 5
7 8:45:00 House 3 F 8 Person 5
8 8:50:00 House 2 G 9 Person 7
这是我的预期输出:
Time Place Area On Person
0 8:03:00 House 1 A 1 Person 1
1 8:17:00 House 2 B 2 Person 1
2 8:20:00 House 3 C 3 Person 1
3 8:28:00 House 4 D 4 Person 2
4 8:35:00 House 5 E 5 Person 3
5 8:40:00 House 6 D 6 Person 2
6 8:42:00 House 2 E 7 Person 3
7 8:45:00 House 3 F 8 Person 2
8 8:50:00 House 2 G 9 Person 3
关于如何获取此输出的说明:
Description on how I want to get this output:
Index 0: One `unique` value occurring. So `assign` to individual 1
Index 1: Two `unique` values occurring. So `assign` to individual 1
Index 2: Three `unique` values occurring. So `assign` to individual 1
Index 3: Four `unique` values on. So `assign` to individual 2
Index 4: Five `unique` values on. This one is a bit tricky and hard to conceptualise. But there is another `E` within an `hour`. So `assign` to a new individual so it can be combined with the other `E`
Index 5: Six `unique` values on. Should be `assigned` with the other `D`. So individual 2
Index 6: Seven `unique` values on. Should be `assigned` with other `E`. So individual 3
Index 7: Eight `unique` values on. New value in `[Area]`, which is a _leftover_. `Assign` to either individual 2 or 3
Index 8: Nine `unique` values on. New value in `[Area]`, which is a _leftover_. `Assign` to either individual 2 or 3
示例2:
d = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','8:40:00','8:42:00','8:45:00','8:50:00'],
'Place' : ['House 1','House 2','House 3','House 1','House 2','House 3','House 1','House 2','House 3'],
'Area' : ['X','X','X','X','X','X','X','X','X'],
'On' : ['1','2','3','3','3','3','3','3','3'],
'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1'],
})
df = pd.DataFrame(data=d)
我遇到错误:
IndexError: index 1 is out of bounds for axis 1 with size 1
在这一行:
df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
但是,如果我将Person更改为1,2,3重复,它将返回以下内容:
However, if I change the Person to 1,2,3 repeating, it returns the following:
'Person' : ['Person 1','Person 2','Person 3','Person 1','Person 2','Person 3','Person 1','Person 2','Person 3'],
Time Place Area On Person
0 8:03:00 House 1 X 1 Person 1
1 8:17:00 House 2 X 2 Person 1
2 8:20:00 House 3 X 3 Person 1
3 8:28:00 House 1 X 3 Person 2
4 8:35:00 House 2 X 3 Person 2
5 8:40:00 House 3 X 3 Person 2
6 8:42:00 House 1 X 3 Person 3
7 8:45:00 House 2 X 3 Person 3
8 8:50:00 House 3 X 3 Person 3
预期输出:
Time Place Area On Person
0 8:03:00 House 1 X 1 Person 1
1 8:17:00 House 2 X 2 Person 1
2 8:20:00 House 3 X 3 Person 1
3 8:28:00 House 1 X 3 Person 1
4 8:35:00 House 2 X 3 Person 1
5 8:40:00 House 3 X 3 Person 1
6 8:42:00 House 1 X 3 Person 1
7 8:45:00 House 2 X 3 Person 1
8 8:50:00 House 3 X 3 Person 1
示例2的主要内容是:
1) There are <3 unique values on so assign to individual 1
推荐答案
更新
以下是allocatePeople
函数形式的答案.它基于预先计算一个小时内重复出现的所有索引的所有索引:
Here's an answer in the form of the allocatePeople
function. It's based around precomputing all of the indices where the areas repeat within an hour:
from collections import Counter
import numpy as np
import pandas as pd
def getAssignedPeople(df, areasPerPerson):
areas = df['Area'].values
places = df['Place'].values
times = pd.to_datetime(df['Time']).values
maxPerson = np.ceil(areas.size / float(areasPerPerson)) - 1
assignmentCount = Counter()
assignedPeople = []
assignedPlaces = {}
heldPeople = {}
heldAreas = {}
holdAvailable = True
person = 0
# search for repeated areas. Mark them if the next repeat occurs within an hour
ixrep = np.argmax(np.triu(areas.reshape(-1, 1)==areas, k=1), axis=1)
holds = np.zeros(areas.size, dtype=bool)
holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')
for area,place,hold in zip(areas, places, holds):
if (area, place) in assignedPlaces:
# this unique (area, place) has already been assigned to someone
assignedPeople.append(assignedPlaces[(area, place)])
continue
if assignmentCount[person] >= areasPerPerson:
# the current person is already assigned to enough areas, move on to the next
a = heldPeople.pop(person, None)
heldAreas.pop(a, None)
person += 1
if area in heldAreas:
# assign to the person held in this area
p = heldAreas.pop(area)
heldPeople.pop(p)
else:
# get the first non-held person. If we need to hold in this area,
# also make sure the person has at least 2 free assignment slots,
# though if it's the last person assign to them anyway
p = person
while p in heldPeople or (hold and holdAvailable and (areasPerPerson - assignmentCount[p] < 2)) and not p==maxPerson:
p += 1
assignmentCount.update([p])
assignedPlaces[(area, place)] = p
assignedPeople.append(p)
if hold:
if p==maxPerson:
# mark that there are no more people available to perform holds
holdAvailable = False
# this area recurrs in an hour, mark that the person should be held here
heldPeople[p] = area
heldAreas[area] = p
return assignedPeople
def allocatePeople(df, areasPerPerson=3):
assignedPeople = getAssignedPeople(df, areasPerPerson=areasPerPerson)
df = df.copy()
df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
return df
请注意在allocatePeople
中使用df['Person'].unique()
.处理输入中重复人员的情况.假定输入中人员的顺序是应该分配这些人员的所需顺序.
Note the use of df['Person'].unique()
in allocatePeople
. That handles the case where people are repeated in the input. It is assumed that the order of people in the input is the desired order in which those people should be assigned.
我针对OP的示例输入(example1
和example2
)以及我想到的一些边缘情况(认为)符合OP所需的算法对allocatePeople
进行了测试:
I tested allocatePeople
against the OP's example input (example1
and example2
) and also against a couple of edge cases I came up with that I think(?) match the OP's desired algorithm:
ds = dict(
example1 = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],
'Place' : ['House 1','House 2','House 3','House 4','House 5','House 1','House 2','House 3','House 2'],
'Area' : ['A','B','C','D','E','D','E','F','G'],
'On' : ['1','2','3','4','5','6','7','8','9'],
'Person' : ['Person 1','Person 2','Person 3','Person 4','Person 5','Person 4','Person 5','Person 6','Person 7'],
}),
example2 = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','8:40:00','8:42:00','8:45:00','8:50:00'],
'Place' : ['House 1','House 2','House 3','House 1','House 2','House 3','House 1','House 2','House 3'],
'Area' : ['X','X','X','X','X','X','X','X','X'],
'On' : ['1','2','3','3','3','3','3','3','3'],
'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1'],
}),
long_repeats = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:25:00','8:30:00','8:31:00','8:35:00','8:45:00','8:50:00'],
'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 3','House 2'],
'Area' : ['A','A','A','A','B','C','C','C','B'],
'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 4','Person 4','Person 3'],
'On' : ['1','2','3','4','5','6','7','8','9'],
}),
many_repeats = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],
'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 1','House 2'],
'Area' : ['A', 'B', 'C', 'D', 'D', 'E', 'E', 'F', 'F'],
'On' : ['1','2','3','4','5','6','7','8','9'],
'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 3','Person 5','Person 6'],
}),
large_gap = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],
'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 1','House 3'],
'Area' : ['A', 'B', 'C', 'D', 'E', 'F', 'D', 'D', 'D'],
'On' : ['1','2','3','4','5','6','7','8','9'],
'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 3','Person 5','Person 6'],
}),
different_times = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','09:42:00','09:45:00','09:50:00'],
'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 1','House 1'],
'Area' : ['A', 'B', 'C', 'D', 'D', 'E', 'E', 'F', 'G'],
'On' : ['1','2','3','4','5','6','7','8','9'],
'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 3','Person 5','Person 6'],
})
)
expectedPeoples = dict(
example1 = [1,1,1,2,3,2,3,2,3],
example2 = [1,1,1,1,1,1,1,1,1],
long_repeats = [1,1,1,2,2,3,3,3,2],
many_repeats = [1,1,1,2,2,3,3,2,3],
large_gap = [1,1,1,2,3,3,2,2,3],
different_times = [1,1,1,2,2,2,3,3,3],
)
for name,d in ds.items():
df = pd.DataFrame(d)
expected = ['Person %d' % i for i in expectedPeoples[name]]
ap = allocatePeople(df)
print(name, ap, sep='\n', end='\n\n')
np.testing.assert_array_equal(ap['Person'], expected)
assert_array_equal
语句通过,并且输出与OP的预期输出匹配:
The assert_array_equal
statements pass, and the output matches OP's expected output:
example1
Time Place Area On Person
0 8:03:00 House 1 A 1 Person 1
1 8:17:00 House 2 B 2 Person 1
2 8:20:00 House 3 C 3 Person 1
3 8:28:00 House 4 D 4 Person 2
4 8:35:00 House 5 E 5 Person 3
5 08:40:00 House 1 D 6 Person 2
6 08:42:00 House 2 E 7 Person 3
7 08:45:00 House 3 F 8 Person 2
8 08:50:00 House 2 G 9 Person 3
example2
Time Place Area On Person
0 8:03:00 House 1 X 1 Person 1
1 8:17:00 House 2 X 2 Person 1
2 8:20:00 House 3 X 3 Person 1
3 8:28:00 House 1 X 3 Person 1
4 8:35:00 House 2 X 3 Person 1
5 8:40:00 House 3 X 3 Person 1
6 8:42:00 House 1 X 3 Person 1
7 8:45:00 House 2 X 3 Person 1
8 8:50:00 House 3 X 3 Person 1
我的测试用例的输出也符合我的期望:
The output for my test cases matches my expectations as well:
long_repeats
Time Place Area Person On
0 8:03:00 House 1 A Person 1 1
1 8:17:00 House 2 A Person 1 2
2 8:20:00 House 3 A Person 1 3
3 8:25:00 House 4 A Person 2 4
4 8:30:00 House 1 B Person 2 5
5 8:31:00 House 1 C Person 3 6
6 8:35:00 House 2 C Person 3 7
7 8:45:00 House 3 C Person 3 8
8 8:50:00 House 2 B Person 2 9
many_repeats
Time Place Area On Person
0 8:03:00 House 1 A 1 Person 1
1 8:17:00 House 2 B 2 Person 1
2 8:20:00 House 3 C 3 Person 1
3 8:28:00 House 4 D 4 Person 2
4 8:35:00 House 1 D 5 Person 2
5 08:40:00 House 1 E 6 Person 3
6 08:42:00 House 2 E 7 Person 3
7 08:45:00 House 1 F 8 Person 2
8 08:50:00 House 2 F 9 Person 3
large_gap
Time Place Area On Person
0 8:03:00 House 1 A 1 Person 1
1 8:17:00 House 2 B 2 Person 1
2 8:20:00 House 3 C 3 Person 1
3 8:28:00 House 4 D 4 Person 2
4 8:35:00 House 1 E 5 Person 3
5 08:40:00 House 1 F 6 Person 3
6 08:42:00 House 2 D 7 Person 2
7 08:45:00 House 1 D 8 Person 2
8 08:50:00 House 3 D 9 Person 3
different_times
Time Place Area On Person
0 8:03:00 House 1 A 1 Person 1
1 8:17:00 House 2 B 2 Person 1
2 8:20:00 House 3 C 3 Person 1
3 8:28:00 House 4 D 4 Person 2
4 8:35:00 House 1 D 5 Person 2
5 08:40:00 House 1 E 6 Person 2
6 09:42:00 House 2 E 7 Person 3
7 09:45:00 House 1 F 8 Person 3
8 09:50:00 House 1 G 9 Person 3
让我知道它是否可以完成您想要的一切,或者是否仍需要进行一些调整.我认为每个人都渴望看到您实现您的愿景.
Let me know if it does everything you wanted, or if it still needs some tweaks. I think everyone is eager to see you fulfill your vision.
这篇关于在pandas df中重新分配列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!