pandas vs.Numpy数据框 [英] Pandas vs. Numpy Dataframes
问题描述
看看以下几行代码:
df2 = df.copy()
df2[1:] = df[1:]/df[:-1].values -1
df2.ix[0, :] = 0
我们的讲师说,我们需要使用 .values 属性来访问基础的numpy数组,否则,我们的代码将无法正常工作.
Our instructor said we need to use the .values attribute to access the underlying numpy array, otherwise, our code wouldn't work.
我知道pandas DataFrame确实具有作为numpy数组的基础表示形式,但是我不明白为什么我们不能仅通过切片直接在pandas DataFrame上进行操作.
I understand that a pandas DataFrame does have an underlying representation as a numpy array, but I didn't understand why we cannot operate directly on the pandas DataFrame using just slicing.
您能向我说明一下吗?
推荐答案
pandas专注于表格数据结构,并且在执行操作(加法,减法等)时,它着眼于标签-而不是位置.
pandas focuses on tabular data structures and when doing the operations (addition, subtraction etc.) it looks at the labels - not positions.
请考虑以下DataFrame:
Consider the following DataFrame:
df = pd.DataFrame(np.random.randn(5, 3), index=list('abcde'), columns=list('xyz'))
在这里,df[1:]
是:
df[1:]
Out:
x y z
b 1.003035 0.172960 1.160033
c 0.117608 -1.114294 -0.557413
d -1.312315 1.171520 -1.034012
e -0.380719 -0.422896 1.073535
df[:-1]
是:
df[:-1]
Out:
x y z
a 1.367916 1.087607 -0.625777
b 1.003035 0.172960 1.160033
c 0.117608 -1.114294 -0.557413
d -1.312315 1.171520 -1.034012
如果您执行df[1:] / df[:-1]
,则会将b
行除以b
行,将c
行除以c
行,将d
行除以a
和e
,它将无法在另一个DataFrame中找到对应的行(在第一个或第二个中),因此它将返回nan
:
If you do df[1:] / df[:-1]
it will divide row b
's by row b
's, row c
's by row c
's and row d
's by row d
's. For row a
and e
, it will not be able to find corresponding rows in the other DataFrame (either in the first one or in the second one) so it will return nan
:
df[1:] / df[:-1]
Out:
x y z
a NaN NaN NaN
b 1.0 1.0 1.0
c 1.0 1.0 1.0
d 1.0 1.0 1.0
e NaN NaN NaN
如果只想忽略标签进行元素划分,则通过.values
访问其中一个框架的基础numpy数组是一种告诉熊猫忽略标签的方法.由于numpy数组没有标签,因此熊猫将只执行按元素操作:
If you just want to do element-wise division ignoring the labels, accessing the underlying numpy array by .values
for one of the frames is a way of telling pandas to ignore labels. Since numpy arrays don't have labels, pandas will just do element-wise operations:
df[1:]/df[:-1].values
Out:
x y z
b 0.733258 0.159028 -1.853749
c 0.117252 -6.442482 -0.480515
d -11.158359 -1.051357 1.855018
e 0.290112 -0.360981 -1.038223
这篇关于 pandas vs.Numpy数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!