Pandas 中 map、applymap 和 apply 方法的区别 [英] Difference between map, applymap and apply methods in Pandas
问题描述
你能告诉我什么时候用基本的例子来使用这些矢量化方法吗?
我看到 map
是一个 Series
方法,而其余的是 DataFrame
方法.我对 apply
和 applymap
方法感到困惑.为什么我们有两种方法可以将函数应用于 DataFrame?同样,说明用法的简单示例会很棒!
Comparing <块引用>
脚注
map
传递字典/系列时,将根据该字典/系列中的键映射元素.缺失值将被记录为输出中的 NaN.applymap
在更新的版本中针对某些操作进行了优化.你会发现applymap
比apply
稍微快一点某些情况下.我的建议是测试它们并使用任何有效的方法更好.map
针对元素映射和转换进行了优化.涉及字典或系列的操作将使熊猫能够使用更快的代码路径以获得更好的性能.Series.apply
返回一个用于聚合操作的标量,否则返回 Series.DataFrame.apply
也类似.注意apply
也有使用某些 NumPy 函数(例如mean
)调用时的快速路径,sum
等
Can you tell me when to use these vectorization methods with basic examples?
I see that map
is a Series
method whereas the rest are DataFrame
methods. I got confused about apply
and applymap
methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!
Comparing map
, applymap
and ap
ply
: Context Matters
First major difference: DEFINITION
map
is defined on Series ONLYapplymap
is defined on DataFrames ONLYapply
is defined on BOTH
Second major difference: INPUT ARGUMENT
map
acceptsdict
s,Series
, or callableapplymap
andapply
accept callables only
Third major difference: BEHAVIOR
map
is elementwise for Seriesapplymap
is elementwise for DataFramesapply
also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.
Fourth major difference (the most important one): USE CASE
map
is meant for mapping values from one domain to another, so is optimised for performance (e.g.,df['A'].map({1:'a', 2:'b', 3:'c'})
)applymap
is good for elementwise transformations across multiple rows/columns (e.g.,df[['A', 'B', 'C']].applymap(str.strip)
)apply
is for applying any function that cannot be vectorised (e.g.,df['sentences'].apply(nltk.sent_tokenize)
).
Also see When should I (not) want to use pandas apply() in my code? for a writeup I made a while back on the most appropriate scenarios for using apply
(note that there aren't many, but there are a few -- apply is generally slow`).
#Summarising
Footnotes
map
when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.
applymap
in more recent versions has been optimised for some operations. You will findapplymap
slightly faster thanapply
in some cases. My suggestion is to test them both and use whatever works better.
map
is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.
Series.apply
returns a scalar for aggregating operations, Series otherwise. Similarly forDataFrame.apply
. Note thatapply
also has fastpaths when called with certain NumPy functions such asmean
,sum
, etc.
这篇关于Pandas 中 map、applymap 和 apply 方法的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!