使用其他列之间的各种条件逻辑创建新列 [英] Create new column with various conditional logic between other columns
问题描述
我有以下数据集
test = pd.DataFrame({'date':['2018-08-01','2018-08-02','2018-08-03','2019-09-01','2019-09-02','2019-09-03','2020-01-02','2020-01-03','2020-01-04','2020-10-04','2020-10-05'],
'account':['a','a','a','b','b','b','c','c','c','d','e'],
'tot_chg':[2072,2072,2072,322,322,322,483,483,483,140,570],
'denied':[1878,1036,1036,322,161,161,150,322,322,105,570],
'denied_sum':[1878,2914,3950,322,483,644,150,472,794,105,570]})
其中,我想根据以下参数添加一个名为 denied_true
的新列:
in which I would like to append a new column called denied_true
based on the following parameters:
- 而
denied_sum
小于tot_chgs
时,返回拒绝
- 直到
denied_sum
超过tot_chgs
,然后计算所有先前denied_true
的总和减去t他tot_chgs
- ,如果
被拒绝
等于首先是tot_chgs
,只需返回denied
并为帐户保留剩余的行0
- while
denied_sum
is less thantot_chgs
, returndenied
- until the
denied_sum
exceedstot_chgs
, then compute the remaining difference between the sum of all priordenied_true
less thetot_chgs
- and if
denied
ever equalstot_chgs
at the first instance, just returndenied
and make remaining rows for the account 0
输出实际上应该像这样:
The output should effectively look like this:
The输出的数据帧是:
The dataframe for the output is:
output = pd.DataFrame({'date':['2018-08-01','2018-08-02','2018-08-03','2019-09-01','2019-09-02','2019-09-03','2020-01-02','2020-01-03','2020-01-04','2020-10-04','2020-10-05'],
'account':['a','a','a','b','b','b','c','c','c','d','e'],
'tot_chg':[2072,2072,2072,322,322,322,483,483,483,140,570],
'denied':[1878,1036,1036,322,161,161,150,322,322,105,570],
'denied_sum':[1878,2914,3950,322,483,644,150,472,794,105,570],
'denied_true':[1878,194,0,322,0,0,150,322,11,105,570]})
到目前为止,我已经使用where尝试了以下代码,但是缺少从tot_chgs中减去先前的deny_true值的条件
So far, I have tried the following code using where, but it's missing the condition of subtract the previous denied_true value from the tot_chgs
test['denied_true'] = test.denied_sum.to_numpy()
test.denied_true.where(test.denied_sum.le(test.tot_chg),other=0,inplace=True)
test
但是,我不是确实确定如何向where函数附加多个条件。也许我需要if / elif循环或布尔掩码。任何帮助将不胜感激!
However, I'm not really sure how to append multiple conditions to this where function. Maybe I need if/elif loops, or a boolean mask. Any help would be greatly appreciated!
推荐答案
您可以将DataFrame转换为OrderedDict并通过以下简单方式进行处理:
You can convert DataFrame into OrderedDict and to handle it this straightforward way:
import pandas as pd
from collections import OrderedDict
test = pd.DataFrame({'date': ['2018-08-01', '2018-08-02', '2018-08-03', '2019-09-01', '2019-09-02', '2019-09-03', '2020-01-02', '2020-01-03', '2020-01-04', '2020-10-04', '2020-10-05'],
'account': ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'e'],
'tot_chg': [2072, 2072, 2072, 322, 322, 322, 483, 483, 483, 140, 570],
'denied': [1878, 1036, 1036, 322, 161, 161, 150, 322, 322, 105, 570],
'denied_sum': [1878, 2914, 3950, 322, 483, 644, 150, 472, 794, 105, 570]})
# convert DataFrame into OrderedDict
od = test.to_dict(into=OrderedDict)
# functions (samples)
def zero(dict, row):
# if denied == denied_sum
# change the dict...
return dict['denied'][row]
def ex(dict, row):
# if exceeds
# change the dict...
return 'exceed()'
def eq(dict, row):
# if equals
# change the dict...
return 'equal()'
def get_value(dict, row):
# conditions
if dict['denied'][row] == dict['denied_sum'][row]: return zero(dict, row)
if dict['denied_sum'][row] < dict['tot_chg'][row]: return dict['denied'][row]
if dict['denied_sum'][row] > dict['tot_chg'][row]: return ex(dict, row)
if dict['denied_sum'][row] == dict['tot_chg'][row]: return eq(dict, row)
# MAIN
# make a list (column) of 'denied_true' values
denied_true_list = [(row, get_value(od, row)) for row in range(len(od["date"]))]
# convert the list into a dict
denied_true_dict = {'denied_true': OrderedDict(denied_true_list)}
# add the dict to the OrderedDict
od.update(OrderedDict(denied_true_dict))
# convert the OrderedDict back into DataFrame
test = pd.DataFrame(od)
输入:
date account tot_chg denied denied_sum
0 2018-08-01 a 2072 1878 1878
1 2018-08-02 a 2072 1036 2914
2 2018-08-03 a 2072 1036 3950
3 2019-09-01 b 322 322 322
4 2019-09-02 b 322 161 483
5 2019-09-03 b 322 161 644
6 2020-01-02 c 483 150 150
7 2020-01-03 c 483 322 472
8 2020-01-04 c 483 322 794
9 2020-10-04 d 140 105 105
10 2020-10-05 e 570 570 570
输出:
date account tot_chg denied denied_sum denied_true
0 2018-08-01 a 2072 1878 1878 1878
1 2018-08-02 a 2072 1036 2914 exceed()
2 2018-08-03 a 2072 1036 3950 exceed()
3 2019-09-01 b 322 322 322 322
4 2019-09-02 b 322 161 483 exceed()
5 2019-09-03 b 322 161 644 exceed()
6 2020-01-02 c 483 150 150 150
7 2020-01-03 c 483 322 472 322
8 2020-01-04 c 483 322 794 exceed()
9 2020-10-04 d 140 105 105 105
10 2020-10-05 e 570 570 570 570
由于它只是示例,因此我并未在函数中完全实现您的逻辑。
I didn't make a full implementation of your logic in the functions since it's just a sample.
大约可以通过DataFrame>完成(可能会更轻松一些)。 JSON> DataFrame。
About the same (probably it would be a bit easer) can be done via DataFrame > JSON > DataFrame.
更新。我试图实现函数 ex()
。
Update. I've tried to implement the function ex()
. Here is how it might look like.
def ex(dict, row):
# if exceeds
denied_true_slice = denied_true_list[0:row] # <-- global list
tot_chg_slice = [dict['tot_chg'][r] for r in range(row)]
denied_true_sum = sum ([v for r, v in enumerate(denied_true_slice) if tot_chg_slice[r] > v])
value = tot_chg_slice[-1] - denied_true_sum
return value if value > 0 else 0
我不确定它是否按预期工作。由于我不太了解古怪的条件。但是我敢肯定它看起来很丑陋和神秘,并且可能与最佳Stackoverflow的示例不符。
I'm not quite sure if it works as supposed. Since I'm not fully understand the quirky conditions. But I'm sure it looks rather ugly and cryptic and probably isn't in line with best Stackoverflow's examples.
现在有了全局列表,因此,MAIN部分现在看起来像这样:
Now there is the global list, so, MAIN section now looks like this:
# MAIN
# make a list (column) of 'denied_true' values
denied_true_list = [] # <-- the global list
for row, _ in enumerate(od['date']):
denied_true_list.append(get_value(od,row))
denied_true_list = [(row, value) for row, value in enumerate(denied_true_list)]
# convert the list into a dict
denied_true_dict = {'denied_true': OrderedDict(denied_true_list)}
# add the dict to the OrderedDict
od.update(OrderedDict(denied_true_dict))
# convert the OrderedDict back into DataFrame
test = pd.DataFrame(od)
输出:
date account tot_chg denied denied_sum denied_true
0 2018-08-01 a 2072 1878 1878 1878
1 2018-08-02 a 2072 1036 2914 194
2 2018-08-03 a 2072 1036 3950 0
3 2019-09-01 b 322 322 322 322
4 2019-09-02 b 322 161 483 0
5 2019-09-03 b 322 161 644 0
6 2020-01-02 c 483 150 150 150
7 2020-01-03 c 483 322 472 322
8 2020-01-04 c 483 322 794 0
9 2020-10-04 d 140 105 105 105
10 2020-10-05 e 570 570 570 570
我相信可以通过本地Pandas工具完成更多操作。
I believe it could be done much more pretty via native Pandas tools.
这篇关于使用其他列之间的各种条件逻辑创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!