由于MemoryError,适用于 pandas 的替代方法 [英] Alternatives to pandas apply due to MemoryError
问题描述
我有一个希望应用于数据框的功能:
I have a function that I wish to apply to a dataframe:
def DetermineMid(data, ts):
if data['U'] == 0 and data['D'] > 0:
mid = data['C'] + ts / 2
elif data['U'] > 0 and data['D'] == 0:
mid = data['C'] - ts / 2
else:
diff = data['A'] - data['B']
if diff == 0:
mid = data['C'] + 1
else:
mid = data['C']
return mid
我的df列是A,B,C,D,U.
My df columns are A, B, C, D, U.
我的电话如下:
df = df.apply(DetermineMid, args=(5, ), axis=1).
在较小的数据帧上,这很好用,但对于此数据帧:
On smaller dataframes this works just fine, but for this dataframe:
DatetimeIndex:2561527条目, 2016-11-30 17:00:01至2017-11-29 16:00:00数据列(总计6 列):
Z float64
一个float64
B float64
C float64
U int64
D int64
dtypes:float64(5),int64(2)
内存使用量:156.3 MB
没有
DatetimeIndex: 2561527 entries, 2016-11-30 17:00:01 to 2017-11-29 16:00:00 Data columns (total 6 columns):
Z float64
A float64
B float64
C float64
U int64
D int64
dtypes: float64(5), int64(2)
memory usage: 156.3 MB
None
我收到一个MemoryError.我使用的申请不正确吗?我本以为apply只是遍历行并根据行值创建一个中间值,然后删除所有旧值,因为我不再关心它们了.
I receive a MemoryError. Am I using apply incorrectly? I would have thought apply is just iterating through the rows and creating a value mid based on row values, then dropping all the old values as I do not care about them anymore.
有更好的方法吗?
推荐答案
使用np.select
即
m1= (df['U']==0) & (df['D']>0)
m2 = (df['U']>0) & (df['D']==0)
m3 = (df['A']-df['B'] == 0 )
np.select([m1,m2,m3], [df['C']+ts/2, df['C']-ts/2, df['C']+1 ],df['C'])
这篇关于由于MemoryError,适用于 pandas 的替代方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!