大 pandas .at 与 .loc [英] pandas .at versus .loc

查看:20
本文介绍了大 pandas .at 与 .loc的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在探索如何优化我的代码并遇到了 pandas .at 方法.根据 文档

I've been exploring how to optimize my code and ran across pandas .at method. Per the documentation

基于标签的快速标量访问器

Fast label-based scalar accessor

与 loc 类似,at 提供基于标签的标量查找.您也可以使用这些索引器进行设置.

Similarly to loc, at provides label based scalar lookups. You can also set using these indexers.

所以我运行了一些示例:

So I ran some samples:

import pandas as pd
import numpy as np
from string import letters, lowercase, uppercase

lt = list(letters)
lc = list(lowercase)
uc = list(uppercase)

def gdf(rows, cols, seed=None):
    """rows and cols are what you'd pass
    to pd.MultiIndex.from_product()"""
    gmi = pd.MultiIndex.from_product
    df = pd.DataFrame(index=gmi(rows), columns=gmi(cols))
    np.random.seed(seed)
    df.iloc[:, :] = np.random.rand(*df.shape)
    return df

seed = [3, 1415]
df = gdf([lc, uc], [lc, uc], seed)

print df.head().T.head().T

df 看起来像:

            a                                        
            A         B         C         D         E
a A  0.444939  0.407554  0.460148  0.465239  0.462691
  B  0.032746  0.485650  0.503892  0.351520  0.061569
  C  0.777350  0.047677  0.250667  0.602878  0.570528
  D  0.927783  0.653868  0.381103  0.959544  0.033253
  E  0.191985  0.304597  0.195106  0.370921  0.631576

让我们使用 .at.loc 并确保我得到相同的东西

Lets use .at and .loc and ensure I get the same thing

print "using .loc", df.loc[('a', 'A'), ('c', 'C')]
print "using .at ", df.at[('a', 'A'), ('c', 'C')]

using .loc 0.37374090276
using .at  0.37374090276

使用.loc

%%timeit
df.loc[('a', 'A'), ('c', 'C')]

10000 loops, best of 3: 180 µs per loop

使用.at

%%timeit
df.at[('a', 'A'), ('c', 'C')]

The slowest run took 6.11 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8 µs per loop

这看起来是一个巨大的速度提升.即使在缓存阶段 6.11 * 8 也比 180

This looks to be a huge speed increase. Even at the caching stage 6.11 * 8 is a lot faster than 180

.at 有哪些限制?我有动力使用它.文档说它类似于 .loc 但它的行为并不相似.示例:

What are the limitations of .at? I'm motivated to use it. The documentation says it's similar to .loc but it doesn't behave similarly. Example:

# small df
sdf = gdf([lc[:2]], [uc[:2]], seed)

print sdf.loc[:, :]

          A         B
a  0.444939  0.407554
b  0.460148  0.465239

其中 print sdf.at[:, :] 导致 TypeError: unhashable type

即使意图相似,也显然不一样.

So obviously not the same even if the intent is to be similar.

也就是说,谁能提供有关使用 .at 方法可以做什么和不可以做什么的指导?

That said, who can provide guidance on what can and cannot be done with the .at method?

推荐答案

更新:df.get_value 自 0.21.0 版起已弃用.使用 df.atdf.iat 是推荐的方法.

Update: df.get_value is deprecated as of version 0.21.0. Using df.at or df.iat is the recommended method going forward.

df.at 一次只能访问一个值.

df.at can only access a single value at a time.

df.loc 可以选择多行和/或多列.

df.loc can select multiple rows and/or columns.

请注意,还有 df.get_value,在访问单个值时可能更快:

Note that there is also df.get_value, which may be even quicker at accessing single values:

In [25]: %timeit df.loc[('a', 'A'), ('c', 'C')]
10000 loops, best of 3: 187 µs per loop

In [26]: %timeit df.at[('a', 'A'), ('c', 'C')]
100000 loops, best of 3: 8.33 µs per loop

In [35]: %timeit df.get_value(('a', 'A'), ('c', 'C'))
100000 loops, best of 3: 3.62 µs per loop

<小时>

在幕后,df.at[...] 调用 df.get_value,但它也调用 对键进行一些类型检查.

这篇关于大 pandas .at 与 .loc的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆