重新分配唯一值-Pandas DataFrame [英] Re-assign unique values - pandas DataFrame
问题描述
我正在尝试将pandas
df
中的assign
unique
值分配给特定的个人.
I am trying to assign
unique
values in pandas
df
to specific individuals.
对于下面的df
,[Area]
和[Place]
将共同构成不同的职位的unique
值.这些值将分配给个人,其总体目标是使用尽可能少的个人.
For the df
below, [Area]
and [Place]
will together make up unique
values that are various jobs. These values will be assigned to individuals with the overall aim to use the least amount of individuals possible.
诀窍是这些值会不断地开始和结束,并持续不同的时间长度.一次分配给一个人的最多unique
个值是 3 . [On]
显示[Place]和[Area]当前出现了多少个唯一值.
The trick is these values are constantly starting and finishing and go for different lengths of time. The most unique
values assigned to an individual any one time is 3. [On]
displays how many current unique values for [Place] and [Area] are occurring.
因此,这为我需要多少个人提供了具体的指导.例如3个unique
值一个= 1人,六个唯一值on = 2人
So this provides a concrete guide on how many individuals I need. e.g. 3 unique
values one = 1 person, 6 unique values on = 2 persons
我无法执行groupby
语句,其中我将assign
的第一个3 unique values
到individual 1
以及接下来的3个unique
值到individual 2
等.
I can't do a groupby
statement where I assign
the first 3 unique values
to individual 1
and the next 3 unique
values to individual 2
etc.
我想的是,当unique
值大于3时,我想先将[Area]
中的值分组,然后合并剩余的部分.因此,将[Area]
中的assign
相同值看成一个个体(最多3个).然后,如果存在_leftover_
个值(< 3),则应将它们组合成3的组,在可能的情况下.
What I envisage is, when unique
values are greater than 3 I want to group values in [Area]
first and then combine the leftovers. So look to assign
same values in [Area]
to an individual (up to 3). Then, if there are _leftover_
values (<3), they should be combined to make a group of 3, where possible.
我设想的工作方式是:hour
展望未来.对于每个新的row
值,script
应该看到[On]
会有多少个值(这表明需要多少个人).如果unique
值> 3,则它们应为assigned
乘以grouping
与[Area]
中相同的值.如果有剩余个值,则应将它们组合起来以组成3个一组.
The way I envisage this working is: see into the future by an hour
. For each new row
of values the script
should see how many values will be [On]
(this provides an indication of how many total individuals are required). Where unique
values are >3, they should be assigned
by grouping
the same value in [Area]
. If there are leftover values they should be combined anyhow to make up to a group of 3.
对于下面的df
,[Place]
和[Area]
出现的unique
值的数量在1-6之间变化.因此,我们不应有超过2个人assigned
.当unique
值> 3时,应首先由[Area]
分配. 剩余值应与其他少于3个unique
值的个体组合.
For the df
below, the number of unique
values occurring for [Place]
and [Area]
varies between 1-6. So we should never have more than 2 individuals assigned
. When unique
values are >3 it should be assigned by [Area]
first. The leftover values should be combined with other individuals that have less than 3 unique
values.
为大型df致歉.这是我可以复制问题的唯一方法!
import pandas as pd
import numpy as np
from collections import Counter
d = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:33:00','8:47:00','8:48:00','9:03:00','9:15:00','9:18:00','9:33:00','9:45:00','9:48:00','10:03:00','10:15:00','10:15:00','10:15:00','10:18:00','10:32:00','10:33:00','10:39:00','10:43:00','10:48:00','10:50:00','11:03:00','11:03:00','11:07:00','11:25:00','11:27:00','11:42:00','11:48:00','11:51:00','11:57:00','12:00:00','12:08:00','12:15:00','12:17:00','12:25:00','12:30:00','12:35:00','12:39:00','12:47:00','12:52:00','12:55:00','13:00:00','13:03:00','13:07:00','13:12:00','13:15:00','13:22:00','13:27:00','13:27:00'],
'Area' : ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','B','A','B','A','A','A','A','B','A','A','B','B','A','B','C','A','B','C','C','A','B','C','C','B','A','C','B','C','C','A','C','B','C','C','A','C'],
'Place' : ['House 1','House 2','House 3','House 1','House 3','House 2','House 1','House 3','House 2','House 1','House 3','House 2','House 1','House 3','House 4','House 1','House 2','House 1','House 1','House 4','House 3','House 2','House 1','House 1','House 4','House 1','House 1','House 4','House 1','House 1','House 4','House 1','House 2','House 1','House 4','House 1','House 1','House 2','House 1','House 4','House 1','House 1','House 3','House 2','House 4','House 1','House 2','House 4','House 1','House 4','House 2'],
'On' : ['1','2','3','3','3','3','3','3','3','3','3','3','3','3','4','5','5','5','5','5','5','4','3','3','3','2','2','2','2','3','3','3','4','4','4','4','4','4','4','4','4','4','4','4','4','4','5','6','6','6','6'],
'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 2','Person 3','Person 1','Person 3','Person 1','Person 2','Person 1','Person 1','Person 3','Person 1','Person 2','Person 3','Person 3','Person 2','Person 3','Person 4','Person 2','Person 3','Person 4','Person 4','Person 2','Person 3','Person 4','Person 4','Person 3','Person 2','Person 4','Person 3','Person 4','Person 4','Person 2','Person 4','Person 3','Person 5','Person 4','Person 2','Person 4'],
})
df = pd.DataFrame(data=d)
def getAssignedPeople(df, areasPerPerson):
areas = df['Area'].values
places = df['Place'].values
times = pd.to_datetime(df['Time']).values
maxPerson = np.ceil(areas.size / float(areasPerPerson)) - 1
assignmentCount = Counter()
assignedPeople = []
assignedPlaces = {}
heldPeople = {}
heldAreas = {}
holdAvailable = True
person = 0
# search for repeated areas. Mark them if the next repeat occurs within an hour
ixrep = np.argmax(np.triu(areas.reshape(-1, 1)==areas, k=1), axis=1)
holds = np.zeros(areas.size, dtype=bool)
holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')
for area,place,hold in zip(areas, places, holds):
if (area, place) in assignedPlaces:
# this unique (area, place) has already been assigned to someone
assignedPeople.append(assignedPlaces[(area, place)])
continue
if assignmentCount[person] >= areasPerPerson:
# the current person is already assigned to enough areas, move on to the next
a = heldPeople.pop(person, None)
heldAreas.pop(a, None)
person += 1
if area in heldAreas:
# assign to the person held in this area
p = heldAreas.pop(area)
heldPeople.pop(p)
else:
# get the first non-held person. If we need to hold in this area,
# also make sure the person has at least 2 free assignment slots,
# though if it's the last person assign to them anyway
p = person
while p in heldPeople or (hold and holdAvailable and (areasPerPerson - assignmentCount[p] < 2)) and not p==maxPerson:
p += 1
assignmentCount.update([p])
assignedPlaces[(area, place)] = p
assignedPeople.append(p)
if hold:
if p==maxPerson:
# mark that there are no more people available to perform holds
holdAvailable = False
# this area recurrs in an hour, mark that the person should be held here
heldPeople[p] = area
heldAreas[area] = p
return assignedPeople
def allocatePeople(df, areasPerPerson=3):
assignedPeople = getAssignedPeople(df, areasPerPerson=areasPerPerson)
df = df.copy()
df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
return df
print(allocatePeople(df))
输出:
Time Area Place On Person
0 8:03:00 A House 1 1 Person 1
1 8:17:00 A House 2 2 Person 1
2 8:20:00 A House 3 3 Person 1
3 8:33:00 A House 1 3 Person 1
4 8:47:00 A House 3 3 Person 1
5 8:48:00 A House 2 3 Person 1
6 9:03:00 A House 1 3 Person 1
7 9:15:00 A House 3 3 Person 1
8 9:18:00 A House 2 3 Person 1
9 9:33:00 A House 1 3 Person 1
10 9:45:00 A House 3 3 Person 1
11 9:48:00 A House 2 3 Person 1
12 10:03:00 A House 1 3 Person 1
13 10:15:00 A House 3 3 Person 1
14 10:15:00 A House 4 4 Person 2
15 10:15:00 B House 1 5 Person 2
16 10:18:00 A House 2 5 Person 1
17 10:32:00 B House 1 5 Person 2
18 10:33:00 A House 1 5 Person 1
19 10:39:00 A House 4 5 Person 2
20 10:43:00 A House 3 5 Person 1
21 10:48:00 A House 2 4 Person 1
22 10:50:00 B House 1 3 Person 2
23 11:03:00 A House 1 3 Person 1
24 11:03:00 A House 4 3 Person 2
25 11:07:00 B House 1 2 Person 2
26 11:25:00 B House 1 2 Person 2
27 11:27:00 A House 4 2 Person 2
28 11:42:00 B House 1 2 Person 2
29 11:48:00 C House 1 3 Person 2
30 11:51:00 A House 4 3 Person 2
31 11:57:00 B House 1 3 Person 2
32 12:00:00 C House 2 4 Person 3
33 12:08:00 C House 1 4 Person 2
34 12:15:00 A House 4 4 Person 2
35 12:17:00 B House 1 4 Person 2
36 12:25:00 C House 1 4 Person 2
37 12:30:00 C House 2 4 Person 3
38 12:35:00 B House 1 4 Person 2
39 12:39:00 A House 4 4 Person 2
40 12:47:00 C House 1 4 Person 2
41 12:52:00 B House 1 4 Person 2
42 12:55:00 C House 3 4 Person 3
43 13:00:00 C House 2 4 Person 3
44 13:03:00 A House 4 4 Person 2
45 13:07:00 C House 1 4 Person 2
46 13:12:00 B House 2 5 Person 3
47 13:15:00 C House 4 6 Person 4
48 13:22:00 C House 1 6 Person 2
49 13:27:00 A House 4 6 Person 2
50 13:27:00 C House 2 6 Person 3
预期的输出和对我认为应该分配它的原因的评论:
Intended Output and Comments on why I think it should be assigned:
推荐答案
您看到的错误是由于(还有另一个)有趣的问题边缘情况造成的.在6th
作业期间,代码将person 2
分配给(A, House 4)
.然后,它看到区域A
在一个小时内重复出现,因此它将person 2
保留在该区域中.这将使person 2
无法用于下一个作业,该作业位于区域B
中.
The bug you're seeing is due to (yet another) interesting edge case of your problem. During the 6th
job, the code assigns person 2
to (A, House 4)
. It then then sees that area A
repeats within an hour, so it holds person 2
in that area. This makes person 2
unavailable for the next job, which is in area B
.
但是,由于在(A, House 1)
中发生的工作,没有理由将person 2
保留在区域A
中,因为区域和位置(A, House 1)
的唯一组合已经分配给person 1
However, there's no reason to hold person 2
in area A
for the sake of a job that occurs in (A, House 1)
, since the unique combination of area and place (A, House 1)
has already been assigned to person 1
.
在决定何时将一个人抱在一个区域中时,可以通过仅考虑区域和位置的唯一组合来解决该问题.只需更改几行代码即可.
The problem can be fixed by considering only unique combinations of area and place when deciding when to hold a person in an area. Only a couple of lines of code have to change.
首先,我们构造一个与唯一(区域,地点)对相对应的区域列表:
First, we construct a list of areas that correspond to the unique (area, place) pairs:
unqareas = df[['Area', 'Place']].drop_duplicates()['Area'].values
然后,在标识保全的代码的第一行中,将unqareas
替换为areas
:
Then we just substitute unqareas
for areas
in the first line of the code that identifies holds:
ixrep = np.argmax(np.triu(unqareas.reshape(-1, 1)==unqareas, k=1), axis=1)
完整的列表/测试
import pandas as pd
import numpy as np
from collections import Counter
d = ({
'Time' : ['8:03:00','8:07:00','8:10:00','8:23:00','8:27:00','8:30:00','8:37:00','8:40:00','8:48:00'],
'Place' : ['House 1','House 2','House 3','House 1','House 2','House 3','House 4','House 1','House 1'],
'Area' : ['A','A','A','A','A','A','A','B','A'],
'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 2','Person 3','Person 1'],
'On' : ['1','2','3','3','3','3','4','5','5']
})
df = pd.DataFrame(data=d)
def getAssignedPeople(df, areasPerPerson):
areas = df['Area'].values
unqareas = df[['Area', 'Place']].drop_duplicates()['Area'].values
places = df['Place'].values
times = pd.to_datetime(df['Time']).values
maxPerson = np.ceil(areas.size / float(areasPerPerson)) - 1
assignmentCount = Counter()
assignedPeople = []
assignedPlaces = {}
heldPeople = {}
heldAreas = {}
holdAvailable = True
person = 0
# search for repeated areas. Mark them if the next repeat occurs within an hour
ixrep = np.argmax(np.triu(unqareas.reshape(-1, 1)==unqareas, k=1), axis=1)
holds = np.zeros(areas.size, dtype=bool)
holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')
for area,place,hold in zip(areas, places, holds):
if (area, place) in assignedPlaces:
# this unique (area, place) has already been assigned to someone
assignedPeople.append(assignedPlaces[(area, place)])
continue
if assignmentCount[person] >= areasPerPerson:
# the current person is already assigned to enough areas, move on to the next
a = heldPeople.pop(person, None)
heldAreas.pop(a, None)
person += 1
if area in heldAreas:
# assign to the person held in this area
p = heldAreas.pop(area)
heldPeople.pop(p)
else:
# get the first non-held person. If we need to hold in this area,
# also make sure the person has at least 2 free assignment slots,
# though if it's the last person assign to them anyway
p = person
while p in heldPeople or (hold and holdAvailable and (areasPerPerson - assignmentCount[p] < 2)) and not p==maxPerson:
p += 1
assignmentCount.update([p])
assignedPlaces[(area, place)] = p
assignedPeople.append(p)
if hold:
if p==maxPerson:
# mark that there are no more people available to perform holds
holdAvailable = False
# this area recurrs in an hour, mark that the person should be held here
heldPeople[p] = area
heldAreas[area] = p
return assignedPeople
def allocatePeople(df, areasPerPerson=3):
assignedPeople = getAssignedPeople(df, areasPerPerson=areasPerPerson)
df = df.copy()
df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
return df
print(allocatePeople(df))
输出:
Time Place Area Person On
0 8:03:00 House 1 A Person 1 1
1 8:07:00 House 2 A Person 1 2
2 8:10:00 House 3 A Person 1 3
3 8:23:00 House 1 A Person 1 3
4 8:27:00 House 2 A Person 1 3
5 8:30:00 House 3 A Person 1 3
6 8:37:00 House 4 A Person 2 4
7 8:40:00 House 1 B Person 2 5
8 8:48:00 House 1 A Person 1 5
这篇关于重新分配唯一值-Pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!