索引在Pandas中如何工作? [英] How indexing works in Pandas?

查看:48
本文介绍了索引在Pandas中如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python的新手.这似乎是一个基本问题.但我真的很想了解这里发生的事情

I am new to python. This seems like a basic question to ask. But I really want to understand what is happening here

import numpy as np 
import pandas as pd 
tempdata = np.random.random(5)
myseries_one = pd.Series(tempdata)
myseries_two = pd.Series(data = tempdata, index = ['a','b','c','d','e'])
myseries_three = pd.Series(data = tempdata, index = [10,11,12,13,14])


myseries_one
Out[1]: 
0    0.291293
1    0.381014
2    0.923360
3    0.271671
4    0.605989
dtype: float64

myseries_two
Out[2]: 
a    0.291293
b    0.381014
c    0.923360
d    0.271671
e    0.605989
dtype: float64

myseries_three
Out[3]: 
10    0.291293
11    0.381014
12    0.923360
13    0.271671
14    0.605989
dtype: float64

索引每个数据帧中的第一个元素

Indexing first element from each dataframe

myseries_one[0] #As expected
Out[74]: 0.29129291112626043

myseries_two[0] #As expected
Out[75]: 0.29129291112626043

myseries_three[0]
KeyError:0 

怀疑1:为什么会这样?为什么myseries_three [0]给我一个keyError? 调用myseries_one [0],myseries_one [0]或myseries_three [0]是什么意思?以这种方式调用是否意味着我们通过行名进行调用?

Doubt1 :-Why this is happenening ? Why myseries_three[0] gives me a keyError ? what we meant by calling myseries_one[0] , myseries_one[0] or myseries_three[0] ? Does calling this way mean we are calling by rownames ?

Doubt2:-Python中的行名和行号是否与R中的行名和行号不同?

Doubt2 :-Is rownames and rownumber in Python works as different as rownames and rownumber in R ?

myseries_one[0:2]
Out[78]: 
0    0.291293
1    0.381014
dtype: float64

myseries_two[0:2]
Out[79]: 
a    0.291293
b    0.381014
dtype: float64

myseries_three[0:2]
Out[80]: 
10    0.291293
11    0.381014
dtype: float64

Doubt3:-如果调用myseries_three [0]意味着通过行名进行调用,那么myseries_three [0:3]如何产生输出? myseries_three [0:4]表示我们正在按行号进行调用吗?请解释和指导.我正在从R迁移到python.所以对我来说有点困惑.

Doubt3:- If calling myseries_three[0] meant calling by rownames then how myseries_three[0:3] producing the output ? does myseries_three[0:4] mean we are calling by rownumber ? Please explain and guide. I am migrating from R to python. so its a bit confusing for me.

推荐答案

当您尝试使用myseries[something]切片时,something通常是不明确的.您正在强调这种歧义的情况.对于您而言,熊猫试图通过猜测您的意思来帮助您.

When you are attempting to slice with myseries[something], the something is often ambiguous. You are highlighting a case of that ambiguity. In your case, pandas is trying to help you out by guessing what you mean.

myseries_one[0] #As expected
Out[74]: 0.29129291112626043

myseries_one具有整数标签.这是有道理的,当您尝试使用整数进行切片时,您打算获取标有该整数的元素.事实证明,您有一个标有0的元素,以便返回给您.

myseries_one has integer labels. It would make sense that when you attempt to slice with an integer that you intend to get the element that is labeled with that integer. It turns out, that you have an element labeled with 0 an so that is returned to you.

myseries_two[0] #As expected
Out[75]: 0.29129291112626043

myseries_two具有字符串标签.当标签都是字符串时,极不可能将标签0分割为该系列.因此,pandas假设您的意思是0的位置,并返回第一个元素(感谢pandas,这很有用).

myseries_two has string labels. It's highly unlikely that you meant to slice this series with a label of 0 when labels are all strings. So, pandas assumes that you meant a position of 0 and returns the first element (thanks pandas, that was helpful).

myseries_three[0]
KeyError:0 

myseries_three具有整数标签,并且您尝试使用整数进行切片...完美.让我们为您获取该值... KeyError.糟糕,该索引标签不存在.在这种情况下,大熊猫失败要比猜测您打算按位置分割要安全得多.该文档甚至建议,如果要消除歧义,请使用loc进行基于标签的切片,使用iloc进行基于位置的切片.

myseries_three has integer labels and you are attempting to slice with an integer... perfect. Let's just get that value for you... KeyError. Whoops, that index label does not exist. In this case, it is safer for pandas to fail than to guess that maybe you meant to slice by position. The documentation even suggests that if you want to remove the ambiguity, use loc for label based slicing and iloc for position based slicing.

让我们尝试loc

myseries_one.loc[0]
0.29129291112626043

myseries_two.loc[0]
KeyError:0 

myseries_three.loc[0]
KeyError:0 

只有myseries_one带有标签0.其他两个返回KeyError s

Only myseries_one has a label 0. The other two return KeyErrors

让我们尝试iloc

myseries_one.iloc[0]
0.29129291112626043

myseries_two.iloc[0]
0.29129291112626043

myseries_three.iloc[0]
0.29129291112626043

它们的位置都为0,并相应地返回第一个元素.

They all have a position of 0 and return the first element accordingly.

对于范围切片,pandas决定减少解释,并坚持对整数切片0:2进行位置切片.记住.做出这些决定的是实际的真实人(编写熊猫代码的程序员).当您尝试做一些模棱两可的事情时,您可能会得到不同的结果.要消除歧义,请使用lociloc.

For the range slicing, pandas decides to be less interpretive and sticks to positional slicing for the integer slice 0:2. Keep in mind. Actual real people (the programmers writing pandas code) are the ones making these decisions. When you are attempting to do something that is ambiguous, you may get varying results. To remove ambiguity, use loc and iloc.

iloc

myseries_one.iloc[0:2]

0    0.291293
1    0.381014
dtype: float64

myseries_two.iloc[0:2]

a    0.291293
b    0.381014
dtype: float64

myseries_three.iloc[0:2]

10    0.291293
11    0.381014
dtype: float64

loc

myseries_one.loc[0:2]

0    0.291293
1    0.381014
2    0.923360
dtype: float64

myseries_two.loc[0:2]

TypeError: cannot do slice indexing on <class 'pandas.indexes.base.Index'> with these indexers [0] of <type 'int'>

myseries_three.loc[0:2]

Series([], dtype: float64)

这篇关于索引在Pandas中如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆