使用其他列之间的各种条件逻辑创建新列 [英] Create new column with various conditional logic between other columns

查看:59
本文介绍了使用其他列之间的各种条件逻辑创建新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据集

test = pd.DataFrame({'date':['2018-08-01','2018-08-02','2018-08-03','2019-09-01','2019-09-02','2019-09-03','2020-01-02','2020-01-03','2020-01-04','2020-10-04','2020-10-05'],
                    'account':['a','a','a','b','b','b','c','c','c','d','e'],
                    'tot_chg':[2072,2072,2072,322,322,322,483,483,483,140,570],
                    'denied':[1878,1036,1036,322,161,161,150,322,322,105,570],
                    'denied_sum':[1878,2914,3950,322,483,644,150,472,794,105,570]})

其中,我想根据以下参数添加一个名为 denied_true 的新列:

in which I would like to append a new column called denied_true based on the following parameters:


  1. denied_sum 小于 tot_chgs 时,返回拒绝

  2. 直到 denied_sum 超过 tot_chgs ,然后计算所有先前 denied_true 的总和减去t他 tot_chgs

  3. ,如果被拒绝等于首先是tot_chgs ,只需返回 denied 并为帐户保留剩余的行0

  1. while denied_sum is less than tot_chgs, return denied
  2. until the denied_sum exceeds tot_chgs, then compute the remaining difference between the sum of all prior denied_true less the tot_chgs
  3. and if denied ever equals tot_chgs at the first instance, just return denied and make remaining rows for the account 0

输出实际上应该像这样:

The output should effectively look like this:

The输出的数据帧是:

The dataframe for the output is:

output = pd.DataFrame({'date':['2018-08-01','2018-08-02','2018-08-03','2019-09-01','2019-09-02','2019-09-03','2020-01-02','2020-01-03','2020-01-04','2020-10-04','2020-10-05'],
                    'account':['a','a','a','b','b','b','c','c','c','d','e'],
                    'tot_chg':[2072,2072,2072,322,322,322,483,483,483,140,570],
                    'denied':[1878,1036,1036,322,161,161,150,322,322,105,570],
                    'denied_sum':[1878,2914,3950,322,483,644,150,472,794,105,570],
                    'denied_true':[1878,194,0,322,0,0,150,322,11,105,570]})

到目前为止,我已经使用where尝试了以下代码,但是缺少从tot_chgs中减去先前的deny_true值的条件

So far, I have tried the following code using where, but it's missing the condition of subtract the previous denied_true value from the tot_chgs

test['denied_true'] = test.denied_sum.to_numpy()
test.denied_true.where(test.denied_sum.le(test.tot_chg),other=0,inplace=True)
test

但是,我不是确实确定如何向where函数附加多个条件。也许我需要if / elif循环或布尔掩码。任何帮助将不胜感激!

However, I'm not really sure how to append multiple conditions to this where function. Maybe I need if/elif loops, or a boolean mask. Any help would be greatly appreciated!

推荐答案

您可以将DataFrame转换为OrderedDict并通过以下简单方式进行处理:

You can convert DataFrame into OrderedDict and to handle it this straightforward way:

import pandas as pd
from collections import OrderedDict

test = pd.DataFrame({'date':      ['2018-08-01', '2018-08-02', '2018-08-03', '2019-09-01', '2019-09-02', '2019-09-03', '2020-01-02', '2020-01-03', '2020-01-04', '2020-10-04', '2020-10-05'],
                    'account':    ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'e'],
                    'tot_chg':    [2072, 2072, 2072, 322, 322, 322, 483, 483, 483, 140, 570],
                    'denied':     [1878, 1036, 1036, 322, 161, 161, 150, 322, 322, 105, 570],
                    'denied_sum': [1878, 2914, 3950, 322, 483, 644, 150, 472, 794, 105, 570]})
        
# convert DataFrame into OrderedDict
od = test.to_dict(into=OrderedDict)

# functions (samples)
def zero(dict, row):
    # if denied == denied_sum
    # change the dict...
    return dict['denied'][row]

def ex(dict, row):
    # if exceeds
    # change the dict...
    return 'exceed()'

def eq(dict, row):
    # if equals
    # change the dict...
    return 'equal()'

def get_value(dict, row):
    # conditions
    if dict['denied'][row]     == dict['denied_sum'][row]: return zero(dict, row)
    if dict['denied_sum'][row] <  dict['tot_chg'][row]:    return dict['denied'][row]
    if dict['denied_sum'][row] >  dict['tot_chg'][row]:    return ex(dict, row)
    if dict['denied_sum'][row] == dict['tot_chg'][row]:    return eq(dict, row)


# MAIN

# make a list (column) of 'denied_true' values
denied_true_list = [(row, get_value(od, row)) for row in range(len(od["date"]))]

# convert the list into a dict
denied_true_dict = {'denied_true': OrderedDict(denied_true_list)}

# add the dict to the OrderedDict
od.update(OrderedDict(denied_true_dict))

# convert the OrderedDict back into DataFrame
test = pd.DataFrame(od)

输入:

          date account  tot_chg  denied  denied_sum
0   2018-08-01       a     2072    1878        1878
1   2018-08-02       a     2072    1036        2914
2   2018-08-03       a     2072    1036        3950
3   2019-09-01       b      322     322         322
4   2019-09-02       b      322     161         483
5   2019-09-03       b      322     161         644
6   2020-01-02       c      483     150         150
7   2020-01-03       c      483     322         472
8   2020-01-04       c      483     322         794
9   2020-10-04       d      140     105         105
10  2020-10-05       e      570     570         570

输出:

          date account  tot_chg  denied  denied_sum denied_true
0   2018-08-01       a     2072    1878        1878        1878
1   2018-08-02       a     2072    1036        2914    exceed()
2   2018-08-03       a     2072    1036        3950    exceed()
3   2019-09-01       b      322     322         322         322
4   2019-09-02       b      322     161         483    exceed()
5   2019-09-03       b      322     161         644    exceed()
6   2020-01-02       c      483     150         150         150
7   2020-01-03       c      483     322         472         322
8   2020-01-04       c      483     322         794    exceed()
9   2020-10-04       d      140     105         105         105
10  2020-10-05       e      570     570         570         570

由于它只是示例,因此我并未在函数中完全实现您的逻辑。

I didn't make a full implementation of your logic in the functions since it's just a sample.

大约可以通过DataFrame>完成(可能会更轻松一些)。 JSON> DataFrame。

About the same (probably it would be a bit easer) can be done via DataFrame > JSON > DataFrame.

更新。我试图实现函数 ex()

Update. I've tried to implement the function ex(). Here is how it might look like.

def ex(dict, row):
    # if exceeds
    denied_true_slice = denied_true_list[0:row] # <-- global list
    tot_chg_slice     = [dict['tot_chg'][r] for r in range(row)]
    denied_true_sum   = sum ([v for r, v in enumerate(denied_true_slice) if tot_chg_slice[r] > v])
    value = tot_chg_slice[-1] - denied_true_sum
    return value if value > 0 else 0

我不确定它是否按预期工作。由于我不太了解古怪的条件。但是我敢肯定它看起来很丑陋和神秘,并且可能与最佳Stackoverflow的示例不符。

I'm not quite sure if it works as supposed. Since I'm not fully understand the quirky conditions. But I'm sure it looks rather ugly and cryptic and probably isn't in line with best Stackoverflow's examples.

现在有了全局列表,因此,MAIN部分现在看起来像这样:

Now there is the global list, so, MAIN section now looks like this:

# MAIN

# make a list (column) of 'denied_true' values
denied_true_list = [] # <-- the global list
for row, _ in enumerate(od['date']):
    denied_true_list.append(get_value(od,row))

denied_true_list = [(row, value) for row, value in enumerate(denied_true_list)]

# convert the list into a dict
denied_true_dict = {'denied_true': OrderedDict(denied_true_list)}

# add the dict to the OrderedDict
od.update(OrderedDict(denied_true_dict))

# convert the OrderedDict back into DataFrame
test = pd.DataFrame(od)

输出:

          date account  tot_chg  denied  denied_sum  denied_true
0   2018-08-01       a     2072    1878        1878         1878
1   2018-08-02       a     2072    1036        2914          194
2   2018-08-03       a     2072    1036        3950            0
3   2019-09-01       b      322     322         322          322
4   2019-09-02       b      322     161         483            0
5   2019-09-03       b      322     161         644            0
6   2020-01-02       c      483     150         150          150
7   2020-01-03       c      483     322         472          322
8   2020-01-04       c      483     322         794            0
9   2020-10-04       d      140     105         105          105
10  2020-10-05       e      570     570         570          570

我相信可以通过本地Pandas工具完成更多操作。

I believe it could be done much more pretty via native Pandas tools.

这篇关于使用其他列之间的各种条件逻辑创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆