Pandas iloc vs直接切片? [英] Pandas iloc vs direct slicing?

查看:205
本文介绍了Pandas iloc vs直接切片?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了很多有关iloc vs loc的讨论,我理解了两者之间的区别,但是我不明白的是两者之间的区别是什么

I've read a lot of discussion about iloc vs loc and I understand the difference but what I don't understand is what's the difference between:

indexed_data['var'][0:10]

vs

indexed_data['var'].iloc[0:10]

这些似乎是同一件事,并提供相同的输出.

These seem to be the same thing and give the same output.

我错过了什么吗?谢谢!

Am I missing something? Thanks!

推荐答案

在熊猫的最新版本中,此功能适用于ix函数.

In last versions of pandas this was work for ix function.

但是从pandas 0.20+开始, ix索引器是不推荐使用.

But from pandas 0.20+ ix indexer is deprecated.

因此,将get_loc用于var列的位置,并仅使用iloc进行选择:

So use get_loc for position of var column and select with iloc only:

indexed_data.iloc[0:10, df.columns.get_loc('var')]


我认为两者之间的区别:


In my opinion difference between:

indexed_data['var'][0:10]

和:

indexed_data['var'].iloc[0:10]

主要在][中.我认为最好是避免使用它,因为可能chaining indexing.

is mainly in ][. I think the best is avoid it because possible chaining indexing.

Tom Augspurger(熊猫开发者)的现代熊猫获得建议:

一个粗略的规则是,只要您看到背对背的方括号] [",便表示自己在寻求麻烦.将其替换为.loc[..., ...],您将被设置.

The rough rule is any time you see back-to-back square brackets, ][, you're in asking for trouble. Replace that with a .loc[..., ...] and you'll be set.

所以最好是使用本地熊猫函数,例如lociloc.

So the best is use native pandas function like loc, iloc here.

然后尝试比较每个方法调用的函数,但是在40分钟后我将其停止(确实调用了很多函数).

Then try compare functions called for each method but after one 40 minutes I stop it (really a lot of function is called).

我检查了时间,并且每个功能都不相同:

I check timings and are different for each function:

indexed_data = pd.DataFrame(np.random.randint(3, size=(2000000,1)), columns=['var'])

In [151]: %timeit indexed_data['var'].iloc[0:100000]
10000 loops, best of 3: 62.1 µs per loop

In [152]: %timeit indexed_data['var'][0:100000]
10000 loops, best of 3: 82.3 µs per loop

In [153]: %timeit indexed_data.iloc[0:100000, indexed_data.columns.get_loc('var')]
10000 loops, best of 3: 155 µs per loop

In [154]: %timeit indexed_data.loc[indexed_data.index[0:100000], 'var']
100 loops, best of 3: 7.36 ms per loop

#numpy approach - output is array 
In [155]: %timeit indexed_data['var'].values[0:100000]
100000 loops, best of 3: 5.35 µs per loop

这篇关于Pandas iloc vs直接切片?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆