括号表示法和点表示法之间的速度差异,用于访问 pandas 中的列 [英] Speed difference between bracket notation and dot notation for accessing columns in pandas
问题描述
我们有一个小的数据框:df = pd.DataFrame({'CID': [1,2,3,4,12345, 6]})
Let's have a small dataframe: df = pd.DataFrame({'CID': [1,2,3,4,12345, 6]})
当我搜索成员资格时,根据我要求在df.CID
还是df['CID']
中进行搜索,速度差异很大.
When I search for membership the speed is vastly different based on whether I ask to search in df.CID
or in df['CID']
.
In[25]:%timeit 12345 in df.CID
Out[25]:89.8 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In[26]:%timeit 12345 in df['CID']
Out[26]:42.3 µs ± 334 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In[27]:type( df.CID)
Out[27]: pandas.core.series.Series
In[28]:type( df['CID'])
Out[28]: pandas.core.series.Series
那是为什么?
推荐答案
df['CID']
delegates to NDFrame.__getitem__
and it is more obvious you are performing an indexing operation.
另一方面,df.CID
代表 NDFrame.__getattr__
,它必须做一些额外的繁重的工作,主要是确定"CID"是您要使用属性访问权限来调用的属性,函数还是列(为方便起见,但不建议将其用于生产代码).
On the other hand, df.CID
delegates to NDFrame.__getattr__
, which has to do some additional heavy lifting, mainly to determine whether 'CID' is an attribute, a function, or a column you're calling using the attribute access (a convenience, but not recommended for production code).
现在,为什么不建议这样做?考虑,
Now, why is it not recommended? Consider,
df = pd.DataFrame({'A': [1, 2, 3]})
df.A
0 1
1 2
2 3
Name: A, dtype: int64
将"A"列称为"df.A
"没有问题,因为它与熊猫中的任何属性或函数命名都没有冲突.但是,请考虑 pop
功能(仅作为示例).
There are no issues referring to column "A" as df.A
, because it does not conflict with any attribute or function namings in pandas. However, consider the pop
function (just as an example).
df.pop
# <bound method NDFrame.pop of ...>
df.pop
是df
的绑定方法.现在,出于各种原因,我想创建一个名为"pop"的列.
df.pop
is a bound method of df
. Now, I'd like to create a column called "pop" for various reasons.
df['pop'] = [4, 5, 6]
df
A pop
0 1 4
1 2 5
2 3 6
很好,但是
df.pop
# <bound method NDFrame.pop of ...>
我无法使用属性符号来访问此列.但是...
I cannot use the attribute notation to access this column. However...
df['pop']
0 4
1 5
2 6
Name: pop, dtype: int64
括号符号仍然有效.这就是为什么这样更好.
Bracket notation still works. That's why this is better.
这篇关于括号表示法和点表示法之间的速度差异,用于访问 pandas 中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!