pandas 访问轴(按用户定义的名称) [英] pandas access axis by user-defined name

查看:75
本文介绍了 pandas 访问轴(按用户定义的名称)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否可以通过用户定义的名称而不是整数或"index","columns","minor_axis"等访问熊猫容器的轴(DataFrame,Panel等).

I am wondering whether there is any way to access axes of pandas containers (DataFrame, Panel, etc...) by user-defined name instead of integer or "index", "columns", "minor_axis" etc...

例如,具有以下数据容器:

For example, with the following data container:

df = DataFrame(randn(3,2),columns=['c1','c2'],index=['i1','i2','i3'])
df.index.name = 'myaxis1'
df.columns.name = 'myaxis2'

我想这样做:

df.sum(axis='myaxis1') 
df.xs('c1', axis='myaxis2')  # cross section

同样非常有用的是:

df.reshape(['myaxis2','myaxis1']) 

(在这种情况下关系不大,但是如果尺寸增加,它可能会变得如此)

(in this case not so relevant, but it could become so if the dimension increases)

原因是我处理了很多不同维度的多维数组,例如时间",变量",百分位数"等,并且同一段代码通常应用于可以带有MultiIndex的DataFrame,Panel甚至Panel4D或DataFrame.现在,我经常对对象的形状或脚本的常规设置进行测试,以便知道哪个轴是计算总和或均值的相关轴.但是我认为,忘记详细信息中容器的实现方式(DataFrame,Panel等)会更加方便,而只需考虑问题的性质(例如我想将其平均化,不想考虑我是在带有几个百分位数的概率"模式下工作,还是在单个时间序列的确定性"模式下工作).

The reason is that I work a lot with multi-dimensional arrays of varying dimensions, like "time", "variable", "percentile" etc...and a same piece of code is often applied to objects which can be DataFrame, Panel or even Panel4D or DataFrame with MultiIndex. For now I often make test on the shape of the object, or on the general settings of the script in order to know which axis is the relevant one to compute a sum or mean. But I think it would be much more convenient to forget about how the container is implemented in the detail (DataFrame, Panel etc...), and simply think about the nature of the problem (say I want to average over the time, I do not want to think about whether I work with in "probabilistic" mode with several percentiles, or in "deterministic" mode with a single time series).

写这篇文章时,我(重新)发现了非常有用的axes属性.上面的代码可以翻译成:

Writing this post I have (re)discovered the very useful axes attribute. The above code could be translated into:

nms = [ax.name for ax in df.axes]
axid1 = nms.index('myaxis1')
axid2 = nms.index('myaxis2')
df.sum(axis=axid1) 
df.xs('c1', axis=axid2)  # cross section

和重塑"功能(尽管不适用于3-d情况...):

and the "reshape" feature (does not apply to 3-d case though...):

newshape = ['myaxis2','myaxis1']
axid = [nms.index(nm) for nm in newshape]
df.swapaxes(*axid)

好吧,我不得不承认我在写这篇文章的时候已经找到了这些解决方案(这已经很方便了),但是它可以被普遍化为考虑使用MultiIndex轴的DataFrame(或其他),对所有对象进行搜索轴和标签...

Well, I have to admit that I have found these solutions while writing this post (and this is already very convenient), but it could be generalized to account for DataFrame (or other) with MultiIndex axes, do a search on all axes and labels...

我认为这将对熊猫的用户友好性产生重大改进(好吧,忘记实际结构可能会降低性能,但是担心性能的用户在组织数据时要格外小心).

In my opinion it would be a major improvement to the user-friendliness of pandas (ok, forgetting about the actual structure could have a performance cost, but the user worried about performance can be careful in how he/she organizes the data).

你怎么看?

推荐答案

这仍处于实验阶段,请查看此页面:

This is still experimental, but look at this page:

http://pandas.pydata.org/pandas- docs/dev/dsintro.html#panelnd-experimental

import pandas
import numpy as np

from pandas.core import panelnd

MyPanel4D = panelnd.create_nd_panel_factory(
    klass_name   = 'MyPanel4D',
    axis_orders  = ['axis4', 'axis3', 'axis2', 'axis1'],
    axis_slices  = {'axis3': 'items',
                    'axis2': 'major_axis',
                    'axis1': 'minor_axis'},
    slicer       = 'Panel',
    stat_axis=2) 
mp4d = MyPanel4D(np.random.rand(5,4,3,2))
print mp4d

结果

<class 'pandas.core.panelnd.MyPanel4D'>
Dimensions: 5 (axis4) x 4 (axis3) x 3 (axis2) x 2 (axis1)
Axis4 axis: 0 to 4
Axis3 axis: 0 to 3
Axis2 axis: 0 to 2
Axis1 axis: 0 to 1

这是一个警告,当您像mp4d[0]一样切片时,将返回一个Panel,除非您创建自定义对象的层次结构(不幸的是,将需要等待0.12-dev以支持重新命名" Panel/DataFrame,它很简单,也没有任何请求)

Here's the caveat, when you slice it like mp4d[0] you are going to get back a Panel, unless you create a hierarchy of custom objects (unfortunately will need to wait for 0.12-dev for support for 'renaming' Panel/DataFrame, its non-trivial and haven't had any requests)

因此对于更暗的对象,您可以强加自己的名称结构.轴数 别名应该按照您的建议工作,但我认为那里存在一些错误

So for higher dim objects you can impose your own name structure. The axis aliasing should work like you are suggesting, but I think there are some bugs there

这篇关于 pandas 访问轴(按用户定义的名称)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆