Python:numpy/pandas根据条件更改值 [英] Python: numpy/pandas change values on condition
问题描述
我想知道是否有更快,更"pythonic"的方法来进行以下操作,例如使用一些内置方法. 给定一个熊猫DataFrame或浮点数numpy数组,如果该值等于或小于0.5,则需要计算倒数,并乘以-1,然后用新计算的值替换旧值. 转换"可能是单词的错误选择,请告诉我您是否有更好/更准确的描述.
感谢您的帮助和支持!
数据:
import numpy as np
import pandas as pd
dicti = {"A" : np.arange(0.0, 3, 0.1),
"B" : np.arange(0, 30, 1),
"C" : list("ELVISLIVES")*3}
df = pd.DataFrame(dicti)
我的功能:
def transform_colname(df, colname):
series = df[colname]
newval_list = []
for val in series:
if val <= 0.5:
newval = (1/val)*-1
newval_list.append(newval)
else:
newval_list.append(val)
df[colname] = newval_list
return df
函数调用:
transform_colname(df, colname="A")
**->我在这里总结一下结果,因为注释不允许发布代码(或者我不知道该怎么做).** >
谢谢大家的快速解答!!
使用ipython%timeit"和真实"数据:
我的功能: 10个循环,每个循环最好3:24.1毫秒
来自jojo:
def transform_colname_v2(df, colname):
series = df[colname]
df[colname] = np.where(series <= 0.5, 1/series*-1, series)
return df
100个循环,每个循环最好3:2.76毫秒
来自FooBar:
def transform_colname_v3(df, colname):
df.loc[df[colname] <= 0.5, colname] = - 1 / df[colname][df[colname] <= 0.5]
return df
100个循环,每个循环最好3:3.32 ms
来自dmvianna:
def transform_colname_v4(df, colname):
df[colname] = df[colname].where(df[colname] <= 0.5, (1/df[colname])*-1)
return df
100个循环,最好为3:每个循环3.7毫秒
请告诉/告诉我是否要以其他方式实现代码!
最后一个问题:(已回答) 如何将"FooBar"和"dmvianna"的版本设置为"generic"?我的意思是,我必须将列的名称写入函数中(因为将其用作变量无效).请解释这最后一点! ->谢谢jojo,.loc"不是正确的方法,但是非常简单的df [colname]就足够了.将上面的功能更改为更通用". (也将>"更改为< =",并更新了时间)
非常感谢您!
如果我们在谈论数组:
import numpy as np
a = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], dtype=np.float)
print 1 / a[a <= 0.5] * (-1)
这将只返回小于0.5
的值.
或者使用np.where
:
import numpy as np
a = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], dtype=np.float)
print np.where(a < 0.5, 1 / a * (-1), a)
谈论pandas
DataFrame :
与 @dmvianna 的回答相同(请给他一些荣誉;)),然后将其调整为pd.DataFrame
:
df.a = df.a.where(df.a > 0.5, (1 / df.a) * (-1))
I would like to know if there is a faster and more "pythonic" way of doing the following, e.g. using some built in methods. Given a pandas DataFrame or numpy array of floats, if the value is equal or smaller than 0.5 I need to calculate the reciprocal value and multiply with -1 and replace the old value with the newly calculated one. "Transform" is probably a bad choice of words, please tell me if you have a better/more accurate description.
Thank you for your help and support!!
Data:
import numpy as np
import pandas as pd
dicti = {"A" : np.arange(0.0, 3, 0.1),
"B" : np.arange(0, 30, 1),
"C" : list("ELVISLIVES")*3}
df = pd.DataFrame(dicti)
my function:
def transform_colname(df, colname):
series = df[colname]
newval_list = []
for val in series:
if val <= 0.5:
newval = (1/val)*-1
newval_list.append(newval)
else:
newval_list.append(val)
df[colname] = newval_list
return df
function call:
transform_colname(df, colname="A")
**--> I'm summing up the results here, since comments wouldn't allow to post code (or I don't know how to do it).**
Thank you all for your fast and great answers!!
using ipython "%timeit" with "real" data:
my function: 10 loops, best of 3: 24.1 ms per loop
from jojo:
def transform_colname_v2(df, colname):
series = df[colname]
df[colname] = np.where(series <= 0.5, 1/series*-1, series)
return df
100 loops, best of 3: 2.76 ms per loop
from FooBar:
def transform_colname_v3(df, colname):
df.loc[df[colname] <= 0.5, colname] = - 1 / df[colname][df[colname] <= 0.5]
return df
100 loops, best of 3: 3.32 ms per loop
from dmvianna:
def transform_colname_v4(df, colname):
df[colname] = df[colname].where(df[colname] <= 0.5, (1/df[colname])*-1)
return df
100 loops, best of 3: 3.7 ms per loop
Please tell/show me if you would implement your code in a different way!
One final QUESTION: (answered) How could "FooBar" and "dmvianna" 's versions be made "generic"? I mean, I had to write the name of the column into the function (since using it as a variable didn't work). Please explain this last point! --> thanks jojo, ".loc" isn't the right way, but very simple df[colname] is sufficient. changed the functions above to be more "generic". (also changed ">" to be "<=", and updated timing)
Thank you very much!!
If we are talking about arrays:
import numpy as np
a = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], dtype=np.float)
print 1 / a[a <= 0.5] * (-1)
This will, however only return the values smaller than 0.5
.
Alternatively use np.where
:
import numpy as np
a = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], dtype=np.float)
print np.where(a < 0.5, 1 / a * (-1), a)
Talking about pandas
DataFrame:
As in @dmvianna's answer (so give some credit to him ;) ), adapting it to pd.DataFrame
:
df.a = df.a.where(df.a > 0.5, (1 / df.a) * (-1))
这篇关于Python:numpy/pandas根据条件更改值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!