加快 pandas 应用功能 [英] Speeding up Pandas apply function

查看：48 发布时间：2020/5/24 2:13:41 python performance pandas apply

本文介绍了加快 pandas 应用功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于一个相对较大的Pandas DataFrame(大约10万行)，我想创建一个由apply函数导致的系列.问题在于该功能不是很快，我希望可以以某种方式加快它的运行速度.

For a relatively big Pandas DataFrame (a few 100k rows), I'd like to create a series that is a result of an apply function. The problem is that the function is not very fast and I was hoping that it can be sped up somehow.

df = pd.DataFrame({
 'value-1': [1, 2, 3, 4, 5],
 'value-2': [0.1, 0.2, 0.3, 0.4, 0.5],
 'value-3': somenumbers...,
 'value-4': more numbers...,
 'choice-index': [1, 1, np.nan, 2, 1]
})

def func(row):
  i = row['choice-index']
  return np.nan if math.isnan(i) else row['value-%d' % i]

df['value'] = df.apply(func, axis=1, reduce=True)

# expected value = [1, 2, np.nan, 0.4, 5]

欢迎提出任何建议.

更新

通过预缓存选定的列可以实现非常小的加速(〜1.1). func将更改为:

A very small speedup (~1.1) can be achieved by pre-caching the selected columns. func would change to:

cached_columns = [None, 'value-1', 'value-2', 'value-3', 'value-4']
def func(row):
  i = row['choice-index']
  return np.nan if math.isnan(i) else row[cached_columns[i]]

但是我希望能提高速度...

But I was hoping for greater speedups...

推荐答案

我认为我有一个不错的解决方案(加速〜150).

I think I got a good solution (speedup ~150).

诀窍不是使用apply，而是进行明智的选择.

The trick is not to use apply, but to do smart selections.

choice_indices = [1, 2, 3, 4]
for idx in choice_indices:
  mask = df['choice-index'] == idx
  result_column = 'value-%d' % (idx)
  df.loc[mask, 'value'] = df.loc[mask, result_column]

这篇关于加快 pandas 应用功能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

加快 pandas 应用功能 [英] Speeding up Pandas apply function

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

加快 pandas 应用功能 [英] Speeding up Pandas apply function

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭