在前两列不存在的python中更新函数 [英] Update a function in python where first two columns doesn't exist
问题描述
我创建了一个检查三列并应用我在函数中提到的条件的函数。我已经将第一列(col0)设置为None。这是我的列的样子:
I have created a function which checks three columns and applies the conditions I have mentioned in the function. I have set first column(col0) as None. This is how my columns look like:
rule_id col0 col1 col2
50378 2 0 0
50402 12 9 6
52879 0 4 3
此处 rule_id列为索引
Here 'rule_id' column is the index
这是我的代码:
for i, j, in dframe.groupby('tx_id'):
df1 = pd.DataFrame(j)
df = df1.pivot_table(index = 'rule_id' , columns = ['date'], values =
'rid_fc', aggfunc = np.sum, fill_value = 0)
coeff = df.T
# compute the coefficients
for name, s in coeff.items():
top = 100 # start at 100
r = []
for i, v in enumerate(s):
if v == 0: # reset to 100 on a 0 value
top=100
else:
top = top/2 # else half the previous value
r.append(top)
coeff.loc[:, name] = r # set the whole column in one operation
# transpose back to have a companion dataframe for df
coeff = coeff.T
def build_comp(col1, col2, i, col0 = None):
conditions = [(df[col1] == 0) & (df[col2] == 0) ,(df[col1] == df[col2]) , (df[col1] != 0) & (df[col2] != 0) & (df[col1] > df[col2]) ,
(df[col1] != 0) & (df[col2] != 0) & (df[col1] < df[col2]) ,(df[col1] != 0) & (df[col2] == 0)]
choices = [np.nan , coeff[col1] , df[col2]/df[col1]*coeff[col1],df[col2]/df[col1]* coeff[col1],100]
condition = [(df[col2] != 0) , (df[col2] == 0)]
choice = [100 , np.nan]
if col0 is not None:
conditions.insert(1, (df[col1] != 0) & (df[col2] == 0) & (df[col0] != 0))
choices.insert(1, 25)
condition.insert(0,(df[col2] != 0) & (df[col1] != 0))
choice.insert(0, 25)
if col0 is None:
condition.insert(0,(df[col2] != 0) & (df[col1] != 0))
choice.insert(0, 25)
df['comp{}'.format(i)] = np.select(conditions , choices , default = np.nan)
df['comp{}'.format(i+1)] = np.select(condition , choice)
col_ref = None
col_prev = df.columns[0]
for i, col in enumerate(df.columns[1:], 1):
build_comp(col_prev, col, i, col_ref)
col_ref = col_prev
col_prev = col
if len(df.columns) == 1:
df['comp1'] = [100] * len(df)
'df'是包含以下内容的数据帧如您所见,此函数涉及多个条件。我想再添加一个,col0和col1均为None,但我不知道如何。我尝试在中添加一个条件,如果col0为None:
像这样:
'df' is the dataframe which has these columns.There are multiple conditions involved in this function as you can see. I want to add one more , which is both col0 and col1 are None but I don't know how. I tried adding a condition inside if col0 is None:
like:
if col1 is None:
conditions.insert(0, (df[col2] != 0)
choices.insert(0, 100)
但是它不起作用,假设我只有一个列(col2),而col0和col1都不在那,那么根据我的条件,结果应该像这样:
But it's not working. Suppose I have only one column (col2) and both col0 and col1 are not there, then the result should be like this as per my condition:
rule_id col2 comp1
50378 2 100
51183 3 100
但是comp列没有创建。如果你们能帮助我实现这一目标,我将不胜感激。
But comp column is not getting created. If you guys could help me achieve that , I'd greatly appreciate it.
当前代码(编辑):使用@Joël建议的代码后,我进行了更改。这是代码:
Current code(Edit): After using the code @Joël suggested. I made the alterations. This is the code:
def build_comp(col2, i, col0 = None, col1 = None):
conditions = [(df[col1] == df[col2]) & (df[col1] != 0) & (df[col2] != 0) , (df[col1] != 0) & (df[col2] != 0) & (df[col1] > df[col2]) ,
(df[col1] != 0) & (df[col2] != 0) & (df[col1] < df[col2]) ,(df[col1] != 0) & (df[col2] == 0)]
choices = [50 , df[col2]/df[col1]*50,df[col2]/df[col1]* 25,100]
condition = [(df[col2] != 0) , (df[col2] == 0)]
choice = [100 , np.nan]
if col0 is not None:
conditions.insert(1, (df[col1] != 0) & (df[col2] == 0) &
(df[col0]!= 0))
choices.insert(1, 25)
condition.insert(0,(df[col2] != 0) & (df[col1] != 0))
choice.insert(0, 25)
else:
condition.insert(0,(df[col2] != 0) & (df[col1] != 0))
choice.insert(0, 25)
if col1 is None:
conditions.insert(0, (df[col2] != 0))
choices.insert(0, 100)
conditions.insert(0, (df[col2] == 0))
choices.insert(0, np.nan)
df['comp{}'.format(i)] = np.select(conditions , choices , default = np.nan)
df['comp{}'.format(i+1)] = np.select(condition , choice)
col_ref = None
col_prev = df.columns[0]
for i, col in enumerate(df.columns[1:], 1):
build_comp(col,i, col_ref , col_prev)
col_ref = col_prev
col_prev = col
运行此代码时,我仍然没有获得comp列。这就是我得到的:
When I run this code , I am still not getting the comp column. This is what I am getting:
rule_id col2
50378 2
51183 3
但是我应该按照我的逻辑来理解:
But I should get this as per my logic:
rule_id col2 comp1
50378 2 100
51183 3 100
我知道 for循环
和 col_prev
逻辑有问题,但是我不知道。
I know there is something wrong with the for loop
and col_prev
logic but I don't know what.
编辑:为进一步简化,这是我的 df
的样子:
For more simplification , this is how my df
looks like:
这是应用代码后的 df外观:
This is my `df' looks like after applying my code:
但现在假设只有一个y出现一个时间戳列,例如:
But now suppose there is only one timestamp column is present such as this:
然后我希望结果是这样的:
Then I want the result to be this:
date 2018-12-11 13:41:51 comp1
rule_id
51183 1 100
52368 1 100
推荐答案
当 df
有单个列时, for
循环将被跳过(即
When df
has a single column, the for
loop gets skipped (i.e. the code in the loop does not get executed).
为了在df只有一列的情况下添加一列,请在末尾添加以下代码:
In order to add a column for the case where df has a single column, add the following code to the end:
if len(df.columns) == 1:
df['comp1'] = [100] * len(df)
这里假设 rule_id
是行标签。如果不是,则与2而不是1进行比较。
This assumes that rule_id
is the row labels. If not, then compare with 2 instead of 1.
这篇关于在前两列不存在的python中更新函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!