检索 pandas 系列中的元素有哪些不同的方法? [英] What are the different methods to retrieve elements in a pandas Series?

查看:48
本文介绍了检索 pandas 系列中的元素有哪些不同的方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

至少有 4 种方法可以检索 pandas 系列中的元素:.iloc、.loc .ix 和直接使用 [] 运算符.

There are at least 4 ways to retrieve elements in a pandas Series: .iloc, .loc .ix and using directly the [] operator.

它们之间有什么区别?他们如何处理缺少标签/超出范围的位置?

What's the difference between them ? How do they handle missing labels/out of range positions ?

推荐答案

总体思路是,虽然 .iloc 和 .loc 保证分别按位置和索引(标签)执行查找,但它们有点慢而不是使用 .ix 或直接使用 [] 运算符.前两种方法根据要查找的系列中的索引类型和应查找的数据按索引或位置执行查找.

The general idea is that while .iloc and .loc are guaranteed to perform the look-up by position and index(label) respectively, they are a bit slower than using .ix or directly the [] operator. These two former methods perform the look-up by index or position depending of the type of index in the Series to look-up and the data that should be looked-up.

然而,在使用 .iloc 和 .loc 时也存在一些不一致之处,如本页所述.

There is however a bit of inconsistency also in using .iloc and .loc, as described in this page.

下表总结了这 4 种查找方法的行为,具体取决于 (a) 要查找的系列是否具有整数或字符串索引(我暂时不考虑日期索引),(b) 如果所需的数据是单个元素、切片索引或列表(是的,行为发生了变化!)和 (c) 是否在数据中找到索引.

The following tables summarise the behaviour of these 4 methods of lookup depending (a) if the Series to look-up has an integer or a string index (I do not consider for the moment the date index), (b) if the required data is a single element, a slice index or a list (yes, the behaviour change!) and (c) if the index is found or not in the data.

以下示例适用于 pandas 0.17.1、NumPy 1.10.4、Python 3.4.3.

The following examples works with pandas 0.17.1, NumPy 1.10.4, Python 3.4.3.

s = pd.Series(np.arange(100,105), index=np.arange(10,15))
s
10    100
11    101
12    102
13    103
14    104

** Single element **             ** Slice **                                       ** Tuple **
s[0]       -> LAB -> KeyError    s[0:2]        -> POS -> {10:100, 11:101}          s[[1,3]]        -> LAB -> {1:NaN, 3:Nan}
s[13]      -> LAB -> 103         s[10:12]      -> POS -> empty Series              s[[12,14]]      -> LAB -> {12:102, 14:104}
---                              ---                                               ---
s.ix[0]    -> LAB -> KeyError    s.ix[0:2]     -> LAB -> empty Series              s.ix[[1,3]]     -> LAB -> {1:NaN, 3:Nan}
s.ix[13]   -> LAB -> 103         s.ix[10:12]   -> LAB -> {10:100, 11:101, 12:102}  s.ix[[12,14]]   -> LAB -> {12:102, 14:104}
---                              ---                                               ---
s.iloc[0]  -> POS -> 100         s.iloc[0:2]   -> POS -> {10:100, 11:101}          s.iloc[[1,3]]   -> POS -> {11:101, 13:103}
s.iloc[13] -> POS -> IndexError  s.iloc[10:12] -> POS -> empty Series              s.iloc[[12,14]] -> POS -> IndexError
---                              ---                                               ---
s.loc[0]   -> LAB -> KeyError    s.loc[0:2]    -> LAB -> empty Series              s.loc[[1,3]]    -> LAB -> KeyError
s.loc[13]  -> LAB -> 103         s.loc[10:12]  -> LAB -> {10:100, 11:101, 12:102}  s.loc[[12,14]]  -> LAB -> {12:102, 14:104}

案例 2:带有字符串索引的系列

s = pd.Series(np.arange(100,105), index=['a','b','c','d','e'])
s
a    100
b    101
c    102
d    103
e    104

** Single element **                             ** Slice **                                           ** Tuple **
s[0]        -> POS -> 100                        s[0:2]          -> POS -> {'a':100,'b':101}           s[[0,2]]          -> POS -> {'a':100,'c':102} 
s[10]       -> LAB, POS -> KeyError, IndexError  s[10:12]        -> POS -> Empty Series                s[[10,12]]        -> POS -> IndexError 
s['a']      -> LAB -> 100                        s['a':'c']      -> LAB -> {'a':100,'b':101, 'c':102}  s[['a','c']]      -> LAB -> {'a':100,'b':101, 'c':102} 
s['g']      -> POS,LAB -> TypeError, KeyError    s['f':'h']      -> LAB -> Empty Series                s[['f','h']]      -> LAB -> {'f':NaN, 'h':NaN}
---                                              ---                                                   ---
s.ix[0]     -> POS -> 100                        s.ix[0:2]       -> POS -> {'a':100,'b':101}           s.ix[[0,2]]       -> POS -> {'a':100,'c':102} 
s.ix[10]    -> POS -> IndexError                 s.ix[10:12]     -> POS -> Empty Series                s.ix[[10,12]]     -> POS -> IndexError 
s.ix['a']   -> LAB -> 100                        s.ix['a':'c']   -> LAB -> {'a':100,'b':101, 'c':102}  s.ix[['a','c']]   -> LAB -> {'a':100,'b':101, 'c':102} 
s.ix['g']   -> POS, LAB -> TypeError, KeyError   s.ix['f':'h']   -> LAB -> Empty Series                s.ix[['f','h']]   -> LAB -> {'f':NaN, 'h':NaN}
---                                              ---                                                   ---
s.iloc[0]   -> POS -> 100                        s.iloc[0:2]     -> POS -> {'a':100,'b':101}           s.iloc[[0,2]]     -> POS -> {'a':100,'c':102} 
s.iloc[10]  -> POS -> IndexError                 s.iloc[10:12]   -> POS -> Empty Series                s.iloc[[10,12]]   -> POS -> IndexError 
s.iloc['a'] -> LAB -> TypeError                  s.iloc['a':'c'] -> POS -> ValueError                  s.iloc[['a','c']] -> POS -> TypeError    
s.iloc['g'] -> LAB -> TypeError                  s.iloc['f':'h'] -> POS -> ValueError                  s.iloc[['f','h']] -> POS -> TypeError
---                                              ---                                                   ---
s.loc[0]    -> LAB -> KeyError                   s.loc[0:2]     -> LAB -> TypeError                   s.loc[[0,2]]     -> LAB -> KeyError 
s.loc[10]   -> LAB -> KeyError                   s.loc[10:12]   -> LAB -> TypeError                   s.loc[[10,12]]   -> LAB -> KeyError 
s.loc['a']  -> LAB-> 100                         s.loc['a':'c'] -> LAB -> {'a':100,'b':101, 'c':102}  s.loc[['a','c']] -> LAB -> {'a':100,'c':102}    
s.loc['g']  -> LAB -> KeyError                   s.loc['f':'h'] -> LAB -> Empty Series                s.loc[['f','h']] -> LAB -> KeyError

请注意,有三种方法可以处理未找到的标签/超出范围的位置:抛出异常、返回空系列或返回具有与 NaN 值相关联的所需键的系列.

Note that there are three ways to handle not found labels/out of range positions: an exception is thrown, a null Series is returned or a Series with the demanded keys associated to NaN values is returned.

另请注意,当使用按位置切片查询时,会排除结束元素,但当按标签查询时,会包含结束元素.

Also note that when querying using slicing by position the end element is excluded, but when querying by label the ending element is included.

这篇关于检索 pandas 系列中的元素有哪些不同的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆