Pandas 数据帧/Numpy 数组“轴"中的歧义定义 [英] Ambiguity in Pandas Dataframe / Numpy Array "axis" definition

查看:20
本文介绍了Pandas 数据帧/Numpy 数组“轴"中的歧义定义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直很困惑如何定义 python 轴,以及它们是指 DataFrame 的行还是列.考虑下面的代码:

<预><代码>>>>df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]], columns=["col1", "col2", "col3", "col4"])>>>dfcol1 col2 col3 col40 1 1 1 11 2 2 2 22 3 3 3 3

因此,如果我们调用 df.mean(axis=1),我们将得到跨行的平均值:

<预><代码>>>>df.mean(轴=1)0 11 22 3

然而,如果我们调用df.drop(name,axis=1),我们实际上删除一列,而不是一行:

<预><代码>>>>df.drop("col4", 轴=1)col1 col2 col30 1 1 11 2 2 22 3 3 3

谁能帮我理解 pandas/numpy/scipy 中的轴"是什么意思?

附注,DataFrame.mean 可能定义错误.它在 中解释了用法:

<块引用>

轴是为多维数组定义的.二维数组有两个对应的轴:第一个垂直向下跨行(轴 0),第二个水平跨列(轴 1).[我的重点]

所以,关于问题中的方法,df.mean(axis=1),似乎是正确定义的.它取水平跨列条目的平均值,即沿每一行的平均值.另一方面,df.mean(axis=0) 将是一个垂直向下跨行的操作.

同样,df.drop(name,axis=1) 指的是对列标签的操作,因为它们直观地跨越了水平轴.指定 axis=0 将使该方法改为作用于行.

I've been very confused about how python axes are defined, and whether they refer to a DataFrame's rows or columns. Consider the code below:

>>> df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]], columns=["col1", "col2", "col3", "col4"])
>>> df
   col1  col2  col3  col4
0     1     1     1     1
1     2     2     2     2
2     3     3     3     3

So if we call df.mean(axis=1), we'll get a mean across the rows:

>>> df.mean(axis=1)
0    1
1    2
2    3

However, if we call df.drop(name, axis=1), we actually drop a column, not a row:

>>> df.drop("col4", axis=1)
   col1  col2  col3
0     1     1     1
1     2     2     2
2     3     3     3

Can someone help me understand what is meant by an "axis" in pandas/numpy/scipy?

A side note, DataFrame.mean just might be defined wrong. It says in the documentation for DataFrame.mean that axis=1 is supposed to mean a mean over the columns, not the rows...

解决方案

It's perhaps simplest to remember it as 0=down and 1=across.

This means:

  • Use axis=0 to apply a method down each column, or to the row labels (the index).
  • Use axis=1 to apply a method across each row, or to the column labels.

Here's a picture to show the parts of a DataFrame that each axis refers to:

It's also useful to remember that Pandas follows NumPy's use of the word axis. The usage is explained in NumPy's glossary of terms:

Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). [my emphasis]

So, concerning the method in the question, df.mean(axis=1), seems to be correctly defined. It takes the mean of entries horizontally across columns, that is, along each individual row. On the other hand, df.mean(axis=0) would be an operation acting vertically downwards across rows.

Similarly, df.drop(name, axis=1) refers to an action on column labels, because they intuitively go across the horizontal axis. Specifying axis=0 would make the method act on rows instead.

这篇关于Pandas 数据帧/Numpy 数组“轴"中的歧义定义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆