在 pandas 中有效使用替代品 [英] Using replace efficiently in pandas

查看：169 发布时间：2018/8/2 13:40:18 python pandas indexing dataframe series

本文介绍了在 pandas 中有效使用替代品的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我希望在python3中以有效的方式使用替换函数。我所拥有的代码正在完成任务，但速度太慢，因为我正在使用大型数据集。因此，无论何时进行权衡，我的优先权都是效率优于优势。这是我想做的玩具：

I am looking to use the replace function in an efficient way in python3. The code I have is achieving the task, but is much too slow, as I am working with a large dataset. Thus, my priority is efficiency over elegancy whenever there is a tradeoff. Here is a toy of what I would like to do:

import pandas as pd
df = pd.DataFrame([[1,2],[3,4],[5,6]], columns = ['1st', '2nd'])

       1st  2nd
   0    1    2
   1    3    4
   2    5    6


idxDict= dict()
idxDict[1] = 'a'
idxDict[3] = 'b'
idxDict[5] = 'c'

for k,v in idxDict.items():
    df ['1st'] = df ['1st'].replace(k, v)

这给出了

如我所愿，但需要太长时间。什么是最快的方式？

as I desire, but it takes way too long. What would be the fastest way?

编辑：这是一个比这个，解决方案类似。

this is a more focused and clean question than this one, for which the solution is similar.

推荐答案

使用< a href =http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.map.html#pandas.Series.map =nofollow noreferrer> map 执行查找：

use map to perform a lookup:

In [46]:
df['1st'] = df['1st'].map(idxDict)
df
Out[46]:
  1st  2nd
0   a    2
1   b    4
2   c    6

以避免没有有效密钥可以通过的情况 na_action ='ignore'

to avoid the situation where there is no valid key you can pass na_action='ignore'

您还可以使用 df ['1st']。替换（idxDict）但回答关于效率的问题：

You can also use df['1st'].replace(idxDict) but to answer you question about efficiency:

时间

In [69]:
%timeit df['1st'].replace(idxDict)
%timeit df['1st'].map(idxDict)

1000 loops, best of 3: 1.57 ms per loop
1000 loops, best of 3: 1.08 ms per loop

In [70]:    
%%timeit
for k,v in idxDict.items():
    df ['1st'] = df ['1st'].replace(k, v)

100 loops, best of 3: 3.25 ms per loop

因此，使用 map 的速度提高了3倍

So using map is over 3x faster here

在更大的数据集上：

In [3]:
df = pd.concat([df]*10000, ignore_index=True)
df.shape

Out[3]:
(30000, 2)

In [4]:    
%timeit df['1st'].replace(idxDict)
%timeit df['1st'].map(idxDict)

100 loops, best of 3: 18 ms per loop
100 loops, best of 3: 4.31 ms per loop

In [5]:    
%%timeit
for k,v in idxDict.items():
    df ['1st'] = df ['1st'].replace(k, v)

100 loops, best of 3: 18.2 ms per loop

Fo r 30K行df， map 快〜4倍，因此它比替换或循环

For 30K row df, map is ~4x faster so it scales better than replace or looping

这篇关于在 pandas 中有效使用替代品的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在 pandas 中有效使用替代品 [英] Using replace efficiently in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在 pandas 中有效使用替代品 [英] Using replace efficiently in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭