括号表示法和点表示法之间的速度差异,用于访问 pandas 中的列 [英] Speed difference between bracket notation and dot notation for accessing columns in pandas

查看:73
本文介绍了括号表示法和点表示法之间的速度差异,用于访问 pandas 中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个小的数据框:df = pd.DataFrame({'CID': [1,2,3,4,12345, 6]})

Let's have a small dataframe: df = pd.DataFrame({'CID': [1,2,3,4,12345, 6]})

当我搜索成员资格时,根据我要求在df.CID还是df['CID']中进行搜索,速度差异很大.

When I search for membership the speed is vastly different based on whether I ask to search in df.CID or in df['CID'].

In[25]:%timeit 12345 in df.CID
Out[25]:89.8 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In[26]:%timeit 12345 in df['CID']
Out[26]:42.3 µs ± 334 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In[27]:type( df.CID)
Out[27]: pandas.core.series.Series

In[28]:type( df['CID'])
Out[28]: pandas.core.series.Series

那是为什么?

推荐答案

df['CID']委托给

df['CID'] delegates to NDFrame.__getitem__ and it is more obvious you are performing an indexing operation.

另一方面,df.CID代表 NDFrame.__getattr__ ,它必须做一些额外的繁重的工作,主要是确定"CID"是您要使用属性访问权限来调用的属性,函数还是列(为方便起见,但不建议将其用于生产代码).

On the other hand, df.CID delegates to NDFrame.__getattr__, which has to do some additional heavy lifting, mainly to determine whether 'CID' is an attribute, a function, or a column you're calling using the attribute access (a convenience, but not recommended for production code).

现在,为什么不建议这样做?考虑,

Now, why is it not recommended? Consider,

df = pd.DataFrame({'A': [1, 2, 3]})
df.A

0    1
1    2
2    3
Name: A, dtype: int64

将"A"列称为"df.A"没有问题,因为它与熊猫中的任何属性或函数命名都没有冲突.但是,请考虑 pop 功能(仅作为示例).

There are no issues referring to column "A" as df.A, because it does not conflict with any attribute or function namings in pandas. However, consider the pop function (just as an example).

df.pop
# <bound method NDFrame.pop of ...>

df.popdf的绑定方法.现在,出于各种原因,我想创建一个名为"pop"的列.

df.pop is a bound method of df. Now, I'd like to create a column called "pop" for various reasons.

df['pop'] = [4, 5, 6]
df
   A  pop
0  1    4
1  2    5
2  3    6

很好,但是

df.pop
# <bound method NDFrame.pop of ...>

我无法使用属性符号来访问此列.但是...

I cannot use the attribute notation to access this column. However...

df['pop']

0    4
1    5
2    6
Name: pop, dtype: int64

括号符号仍然有效.这就是为什么这样更好.

Bracket notation still works. That's why this is better.

这篇关于括号表示法和点表示法之间的速度差异,用于访问 pandas 中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆