重新分配唯一值-Pandas DataFrame [英] Re-assign unique values - pandas DataFrame

查看:104
本文介绍了重新分配唯一值-Pandas DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将pandas df中的assign unique值分配给特定的个人.

I am trying to assign unique values in pandas df to specific individuals.

对于下面的df[Area][Place]将共同构成不同的职位unique值.这些值将分配给个人,其总体目标是使用尽可能少的个人.

For the df below, [Area] and [Place] will together make up unique values that are various jobs. These values will be assigned to individuals with the overall aim to use the least amount of individuals possible.

诀窍是这些值会不断地开始和结束,并持续不同的时间长度.一次分配给一个人的最多unique个值是 3 . [On]显示[Place]和[Area]当前出现了多少个唯一值.

The trick is these values are constantly starting and finishing and go for different lengths of time. The most unique values assigned to an individual any one time is 3. [On] displays how many current unique values for [Place] and [Area] are occurring.

因此,这为我需要多少个人提供了具体的指导.例如3个unique值一个= 1人,六个唯一值on = 2人

So this provides a concrete guide on how many individuals I need. e.g. 3 unique values one = 1 person, 6 unique values on = 2 persons

我无法执行groupby语句,其中我将assign的第一个3 unique valuesindividual 1以及接下来的3个unique值到individual 2等.

I can't do a groupby statement where I assign the first 3 unique values to individual 1 and the next 3 unique values to individual 2 etc.

我想的是,当unique值大于3时,我想先将[Area]中的值分组,然后合并剩余的部分.因此,将[Area]中的assign相同值看成一个个体(最多3个).然后,如果存在_leftover_个值(< 3),则应将它们组合成3的组,在可能的情况下.

What I envisage is, when unique values are greater than 3 I want to group values in [Area] first and then combine the leftovers. So look to assign same values in [Area] to an individual (up to 3). Then, if there are _leftover_ values (<3), they should be combined to make a group of 3, where possible.

我设想的工作方式是:hour展望未来.对于每个新的row值,script应该看到[On]会有多少个值(这表明需要多少个人).如果unique值> 3,则它们应为assigned乘以grouping[Area]中相同的值.如果有剩余个值,则应将它们组合起来以组成3个一组.

The way I envisage this working is: see into the future by an hour. For each new row of values the script should see how many values will be [On](this provides an indication of how many total individuals are required). Where unique values are >3, they should be assigned by grouping the same value in [Area]. If there are leftover values they should be combined anyhow to make up to a group of 3.

对于下面的df[Place][Area]出现的unique值的数量在1-6之间变化.因此,我们不应有超过2个人assigned.当unique值> 3时,应首先由[Area]分配. 剩余值应与其他少于3个unique值的个体组合.

For the df below, the number of unique values occurring for [Place] and [Area] varies between 1-6. So we should never have more than 2 individuals assigned. When unique values are >3 it should be assigned by [Area] first. The leftover values should be combined with other individuals that have less than 3 unique values.

为大型df致歉.这是我可以复制问题的唯一方法!

import pandas as pd
import numpy as np
from collections import Counter

d = ({   
    'Time' : ['8:03:00','8:17:00','8:20:00','8:33:00','8:47:00','8:48:00','9:03:00','9:15:00','9:18:00','9:33:00','9:45:00','9:48:00','10:03:00','10:15:00','10:15:00','10:15:00','10:18:00','10:32:00','10:33:00','10:39:00','10:43:00','10:48:00','10:50:00','11:03:00','11:03:00','11:07:00','11:25:00','11:27:00','11:42:00','11:48:00','11:51:00','11:57:00','12:00:00','12:08:00','12:15:00','12:17:00','12:25:00','12:30:00','12:35:00','12:39:00','12:47:00','12:52:00','12:55:00','13:00:00','13:03:00','13:07:00','13:12:00','13:15:00','13:22:00','13:27:00','13:27:00'],
    'Area' : ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','B','A','B','A','A','A','A','B','A','A','B','B','A','B','C','A','B','C','C','A','B','C','C','B','A','C','B','C','C','A','C','B','C','C','A','C'],
    'Place' : ['House 1','House 2','House 3','House 1','House 3','House 2','House 1','House 3','House 2','House 1','House 3','House 2','House 1','House 3','House 4','House 1','House 2','House 1','House 1','House 4','House 3','House 2','House 1','House 1','House 4','House 1','House 1','House 4','House 1','House 1','House 4','House 1','House 2','House 1','House 4','House 1','House 1','House 2','House 1','House 4','House 1','House 1','House 3','House 2','House 4','House 1','House 2','House 4','House 1','House 4','House 2'],
    'On' : ['1','2','3','3','3','3','3','3','3','3','3','3','3','3','4','5','5','5','5','5','5','4','3','3','3','2','2','2','2','3','3','3','4','4','4','4','4','4','4','4','4','4','4','4','4','4','5','6','6','6','6'],
    'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 2','Person 3','Person 1','Person 3','Person 1','Person 2','Person 1','Person 1','Person 3','Person 1','Person 2','Person 3','Person 3','Person 2','Person 3','Person 4','Person 2','Person 3','Person 4','Person 4','Person 2','Person 3','Person 4','Person 4','Person 3','Person 2','Person 4','Person 3','Person 4','Person 4','Person 2','Person 4','Person 3','Person 5','Person 4','Person 2','Person 4'],
    })

df = pd.DataFrame(data=d)

def getAssignedPeople(df, areasPerPerson):
    areas = df['Area'].values
    places = df['Place'].values
    times = pd.to_datetime(df['Time']).values
    maxPerson = np.ceil(areas.size / float(areasPerPerson)) - 1
    assignmentCount = Counter()
    assignedPeople = []
    assignedPlaces = {}
    heldPeople = {}
    heldAreas = {}
    holdAvailable = True
    person = 0

    # search for repeated areas. Mark them if the next repeat occurs within an hour
    ixrep = np.argmax(np.triu(areas.reshape(-1, 1)==areas, k=1), axis=1)
    holds = np.zeros(areas.size, dtype=bool)
    holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')

    for area,place,hold in zip(areas, places, holds):
        if (area, place) in assignedPlaces:
            # this unique (area, place) has already been assigned to someone
            assignedPeople.append(assignedPlaces[(area, place)])
            continue

        if assignmentCount[person] >= areasPerPerson:
            # the current person is already assigned to enough areas, move on to the next
            a = heldPeople.pop(person, None)
            heldAreas.pop(a, None)
            person += 1

        if area in heldAreas:
            # assign to the person held in this area
            p = heldAreas.pop(area)
            heldPeople.pop(p)
        else:
            # get the first non-held person. If we need to hold in this area, 
            # also make sure the person has at least 2 free assignment slots,
            # though if it's the last person assign to them anyway 
            p = person
            while p in heldPeople or (hold and holdAvailable and (areasPerPerson - assignmentCount[p] < 2)) and not p==maxPerson:
                p += 1

        assignmentCount.update([p])
        assignedPlaces[(area, place)] = p
        assignedPeople.append(p)

        if hold:
            if p==maxPerson:
                # mark that there are no more people available to perform holds
                holdAvailable = False

            # this area recurrs in an hour, mark that the person should be held here
            heldPeople[p] = area
            heldAreas[area] = p

    return assignedPeople

def allocatePeople(df, areasPerPerson=3):
    assignedPeople = getAssignedPeople(df, areasPerPerson=areasPerPerson)
    df = df.copy()
    df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
    return df

print(allocatePeople(df))

输出:

        Time Area    Place On    Person
0    8:03:00    A  House 1  1  Person 1
1    8:17:00    A  House 2  2  Person 1
2    8:20:00    A  House 3  3  Person 1
3    8:33:00    A  House 1  3  Person 1
4    8:47:00    A  House 3  3  Person 1
5    8:48:00    A  House 2  3  Person 1
6    9:03:00    A  House 1  3  Person 1
7    9:15:00    A  House 3  3  Person 1
8    9:18:00    A  House 2  3  Person 1
9    9:33:00    A  House 1  3  Person 1
10   9:45:00    A  House 3  3  Person 1
11   9:48:00    A  House 2  3  Person 1
12  10:03:00    A  House 1  3  Person 1
13  10:15:00    A  House 3  3  Person 1
14  10:15:00    A  House 4  4  Person 2
15  10:15:00    B  House 1  5  Person 2
16  10:18:00    A  House 2  5  Person 1
17  10:32:00    B  House 1  5  Person 2
18  10:33:00    A  House 1  5  Person 1
19  10:39:00    A  House 4  5  Person 2
20  10:43:00    A  House 3  5  Person 1
21  10:48:00    A  House 2  4  Person 1
22  10:50:00    B  House 1  3  Person 2
23  11:03:00    A  House 1  3  Person 1
24  11:03:00    A  House 4  3  Person 2
25  11:07:00    B  House 1  2  Person 2
26  11:25:00    B  House 1  2  Person 2
27  11:27:00    A  House 4  2  Person 2
28  11:42:00    B  House 1  2  Person 2
29  11:48:00    C  House 1  3  Person 2
30  11:51:00    A  House 4  3  Person 2
31  11:57:00    B  House 1  3  Person 2
32  12:00:00    C  House 2  4  Person 3
33  12:08:00    C  House 1  4  Person 2
34  12:15:00    A  House 4  4  Person 2
35  12:17:00    B  House 1  4  Person 2
36  12:25:00    C  House 1  4  Person 2
37  12:30:00    C  House 2  4  Person 3
38  12:35:00    B  House 1  4  Person 2
39  12:39:00    A  House 4  4  Person 2
40  12:47:00    C  House 1  4  Person 2
41  12:52:00    B  House 1  4  Person 2
42  12:55:00    C  House 3  4  Person 3
43  13:00:00    C  House 2  4  Person 3
44  13:03:00    A  House 4  4  Person 2
45  13:07:00    C  House 1  4  Person 2
46  13:12:00    B  House 2  5  Person 3
47  13:15:00    C  House 4  6  Person 4
48  13:22:00    C  House 1  6  Person 2
49  13:27:00    A  House 4  6  Person 2
50  13:27:00    C  House 2  6  Person 3

预期的输出和对我认为应该分配它的原因的评论:

Intended Output and Comments on why I think it should be assigned:

推荐答案

此答案有一个实时版本在线,您可以自己尝试.

您看到的错误是由于(还有另一个)有趣的问题边缘情况造成的.在6th作业期间,代码将person 2分配给(A, House 4).然后,它看到区域A在一个小时内重复出现,因此它将person 2保留在该区域中.这将使person 2无法用于下一个作业,该作业位于区域B中.

The bug you're seeing is due to (yet another) interesting edge case of your problem. During the 6th job, the code assigns person 2 to (A, House 4). It then then sees that area A repeats within an hour, so it holds person 2 in that area. This makes person 2 unavailable for the next job, which is in area B.

但是,由于在(A, House 1)中发生的工作,没有理由将person 2保留在区域A中,因为区域和位置(A, House 1)的唯一组合已经分配给person 1

However, there's no reason to hold person 2 in area A for the sake of a job that occurs in (A, House 1), since the unique combination of area and place (A, House 1) has already been assigned to person 1.

在决定何时将一个人抱在一个区域中时,可以通过仅考虑区域和位置的唯一组合来解决该问题.只需更改几行代码即可.

The problem can be fixed by considering only unique combinations of area and place when deciding when to hold a person in an area. Only a couple of lines of code have to change.

首先,我们构造一个与唯一(区域,地点)对相对应的区域列表:

First, we construct a list of areas that correspond to the unique (area, place) pairs:

unqareas = df[['Area', 'Place']].drop_duplicates()['Area'].values

然后,在标识保全的代码的第一行中,将unqareas替换为areas:

Then we just substitute unqareas for areas in the first line of the code that identifies holds:

ixrep = np.argmax(np.triu(unqareas.reshape(-1, 1)==unqareas, k=1), axis=1)

完整的列表/测试

import pandas as pd
import numpy as np
from collections import Counter

d = ({
     'Time' : ['8:03:00','8:07:00','8:10:00','8:23:00','8:27:00','8:30:00','8:37:00','8:40:00','8:48:00'],
     'Place' : ['House 1','House 2','House 3','House 1','House 2','House 3','House 4','House 1','House 1'],
     'Area' : ['A','A','A','A','A','A','A','B','A'],
     'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 2','Person 3','Person 1'],
     'On' : ['1','2','3','3','3','3','4','5','5']
     })

df = pd.DataFrame(data=d)

def getAssignedPeople(df, areasPerPerson):
    areas = df['Area'].values
    unqareas = df[['Area', 'Place']].drop_duplicates()['Area'].values
    places = df['Place'].values
    times = pd.to_datetime(df['Time']).values

    maxPerson = np.ceil(areas.size / float(areasPerPerson)) - 1
    assignmentCount = Counter()
    assignedPeople = []
    assignedPlaces = {}
    heldPeople = {}
    heldAreas = {}
    holdAvailable = True
    person = 0

    # search for repeated areas. Mark them if the next repeat occurs within an hour
    ixrep = np.argmax(np.triu(unqareas.reshape(-1, 1)==unqareas, k=1), axis=1)
    holds = np.zeros(areas.size, dtype=bool)
    holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')

    for area,place,hold in zip(areas, places, holds):
        if (area, place) in assignedPlaces:
            # this unique (area, place) has already been assigned to someone
            assignedPeople.append(assignedPlaces[(area, place)])
            continue

        if assignmentCount[person] >= areasPerPerson:
            # the current person is already assigned to enough areas, move on to the next
            a = heldPeople.pop(person, None)
            heldAreas.pop(a, None)
            person += 1

        if area in heldAreas:
            # assign to the person held in this area
            p = heldAreas.pop(area)
            heldPeople.pop(p)
        else:
            # get the first non-held person. If we need to hold in this area, 
            # also make sure the person has at least 2 free assignment slots,
            # though if it's the last person assign to them anyway 
            p = person
            while p in heldPeople or (hold and holdAvailable and (areasPerPerson - assignmentCount[p] < 2)) and not p==maxPerson:
                p += 1

        assignmentCount.update([p])
        assignedPlaces[(area, place)] = p
        assignedPeople.append(p)

        if hold:
            if p==maxPerson:
                # mark that there are no more people available to perform holds
                holdAvailable = False

            # this area recurrs in an hour, mark that the person should be held here
            heldPeople[p] = area
            heldAreas[area] = p

    return assignedPeople

def allocatePeople(df, areasPerPerson=3):
    assignedPeople = getAssignedPeople(df, areasPerPerson=areasPerPerson)
    df = df.copy()
    df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
    return df

print(allocatePeople(df))

输出:

      Time    Place Area    Person On
0  8:03:00  House 1    A  Person 1  1
1  8:07:00  House 2    A  Person 1  2
2  8:10:00  House 3    A  Person 1  3
3  8:23:00  House 1    A  Person 1  3
4  8:27:00  House 2    A  Person 1  3
5  8:30:00  House 3    A  Person 1  3
6  8:37:00  House 4    A  Person 2  4
7  8:40:00  House 1    B  Person 2  5
8  8:48:00  House 1    A  Person 1  5

这篇关于重新分配唯一值-Pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆