如何检索传递给函数调用的关键字参数的原始顺序? [英] How to retrieve the original order of key-word arguments passed to a function call?

查看:28
本文介绍了如何检索传递给函数调用的关键字参数的原始顺序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

检索通过 **kwargs 传递的关键字参数的顺序在我正在处理的特定项目中非常有用.它是关于制作一种具有有意义维度的 n-d numpy 数组(现在称为dimarray),特别适用于地球物理数据处理.

现在说我们有:

将 numpy 导入为 npfrom dimarray import Dimarray # 我正在编程的方便类def make_data(nlat, nlon):""" 生成一些示例数据"""值 = np.random.randn(nlat, nlon)lon = np.linspace(-180,180,nlon)纬度 = np.linspace(-90,90,nlat)返回经度、纬度、值

什么有效:

<预><代码>>>>经度,纬度,值 = make_data(180,360)>>>a = Dimarray(值,纬度=纬度,经度=经度)>>>打印 a.lon[0], a.lat[0]-180.0 -90.0

什么不是:

<预><代码>>>>lon, lat, data = make_data(180,180) # 正方形,不能检查形状!>>>a = Dimarray(值,纬度=纬度,经度=经度)>>>打印 a.lon[0], a.lat[0] # 是随机的-90.0, -180.0 # 可能是(实际上我在这种模棱两可的情况下会引发错误)

原因是 Dimarray 的 __init__ 方法的签名是 (values, **kwargs) 并且因为 kwargs 是一个无序字典 (dict)它所能做的最好的事情就是检查 values 的形状.

当然,我希望它适用于任何类型的维度:

a = Dimarray(values, x1=.., x2=...,x3=...)

所以它必须用 **kwargs 硬编码发生歧义的几率随着维度的增加而增加.有很多方法可以解决这个问题,例如使用签名 (values, axes, names, **kwargs) 可以这样做:

a = Dimarray(values, [lat, lon], ["lat","lon"])

但是这种语法对于交互式使用(ipython)来说很麻烦,因为我希望这个包真正成为我(和其他人!!)日常使用python的一部分,作为实际的替代品地球物理学中的 numpy 数组.

我对解决这个问题的方法非常感兴趣.我现在能想到的最好的方法是使用inspect模块的stack方法来解析调用者的语句:

导入检查def f(**kwargs):打印inspect.stack()[1][4]返回元组([kwargs[k] for k in kwargs])>>>打印 f(lon=360, lat=180)[u'打印 f(lon=360, lat=180)\n'](180, 360)>>>打印 f(lat=180, lon=360)[u'打印 f(lat=180, lon=360)\n'](180, 360)

人们可以从中解决一些问题,但存在无法解决的问题,因为 stack() 捕获了所有内容:

<预><代码>>>>打印 (f(lon=360, lat=180), f(lat=180, lon=360))[u'print (f(lon=360, lat=180), f(lat=180, lon=360))\n'][u'print (f(lon=360, lat=180), f(lat=180, lon=360))\n']((180, 360), (180, 360))

还有其他我不知道的检查技巧可以解决这个问题吗?(我不熟悉这个模块)我想得到括号 lon=360, lat=180 之间的一段代码应该是可行的,不是吗??

所以我在python中第一次有一种感觉,根据所有可用信息(用户提供的排序是有价值的信息!!!),在做一些理论上可行的事情方面遇到了困难.

我在那里阅读了 Nick 的有趣建议:https://mail.python.org/pipermail/python-ideas/2011-January/009054.html并且想知道这个想法是否以某种方式取得了进展?

我明白为什么通常不希望有一个有序的 **kwargs,但是针对这些罕见情况的补丁会很整洁.有人知道可靠的黑客吗?

注意:这与 Pandas 无关,我实际上是在尝试为它开发一个轻量级的替代方案,其用法仍然非常接近 numpy.即将发布 gitHub 链接.

请注意,这与 dimarray 的交互式使用相关.无论如何都需要双重语法.

我也看到反驳的论点,即知道数据排序也可以被视为有价值的信息,因为它让 Dimarray 可以自由地检查 values 形状和自动调整顺序.甚至可能不记得数据的维度比两个维度具有相同的大小更常见.所以现在,我想在不明确的情况下引发错误是可以的,要求用户提供 names 参数.尽管如此,拥有做出这种选择的自由(Dimarray 类应该如何表现)会很好,而不是被 Python 缺少的特性所限制.

EDIT 3,解决方案:根据 kazagistar 的建议:

我没有提到还有其他可选的属性参数,例如 name=""units="",以及一些与切片相关的其他参数,所以 *args 结构需要对 kwargs 进行关键字名称测试.

综上所述,有多种可能:

*选择a:保持当前语法

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")a = Dimarray(values, [mylat, mylon], ["lat", "lon"], name="myarray")

*Choice b:kazagistar 的第二个建议,通过 **kwargs

删除轴定义

a = Dimarray(values, ("lat", mylat), ("lon",mylon), name="myarray")

*Choice c:kazagistar 的第二个建议,通过 **kwargs 可选轴定义(注意这涉及从 **kwargs 中提取的 names=,参见下面的背景)

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")a = Dimarray(values, ("lat", mylat), ("lon",mylon), name="myarray")

*Choice d:kazagistar 的第三个建议,通过 **kwargs

可选轴定义

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")a = Dimarray(values, [("lat", mylat), ("lon",mylon)], name="myarray")

嗯,这归结为美学和一些设计问题(在交互模式下,懒惰订购是一个重要功能吗?).我在 b) 和 c) 之间犹豫.我不确定 **kwargs 真的带来了什么.具有讽刺意味的是,当我开始批评时,我开始批评它成为功能...

非常感谢您的回答.我会将问题标记为已回答,但非常欢迎您为 a)、b) c) 或 d) 投票!

======================

EDIT 4:更好的解决方案:选择 a) !!,但添加一个 from_tuples 类方法.这样做的原因是允许多一个自由度.如果未提供轴名称,它们将自动生成为x0"、x1"等......使用起来就像熊猫一样,但使用轴命名.这也避免了将轴 属性混合到 **kwargs 中,并将其仅用于轴.一旦我完成文档,就会有更多内容.

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")a = Dimarray(values, [mylat, mylon], ["lat", "lon"], name="myarray")a = Dimarray.from_tuples(values, ("lat", mylat), ("lon",mylon), name="myarray")

EDIT 5:更多 Pythonic 解决方案?:在用户 api 方面类似于上面的 EDIT 4,但通过包装器 dimarray,同时对 Dimarray 的方式非常严格被实例化.这也符合 kazagistar 提出的精神.

 from dimarray import dimarray, Dimarraya = dimarray(values, lon=mylon, lat=mylat, name="myarray") # 如果 lon 和 lat 大小相同则出错b = dimarray(values, [("lat", mylat), ("lon",mylon)], name="myarray")c = dimarray(values, [mylat, mylon, ...], ['lat','lon',...], name="myarray")d = dimarray(values, [mylat, mylon, ...], name="myarray2")

来自类本身:

 e = Dimarray.from_dict(values, lon=mylon, lat=mylat) # 如果 lon 和 lat 大小相同则出错e.set(name="myarray", inplace=True)f = Dimarray.from_tuples(values, ("lat", mylat), ("lon",mylon), name="myarray")g = Dimarray.from_list(values, [mylat, mylon, ...], ['lat','lon',...], name="myarray")h = Dimarray.from_list(values, [mylat, mylon, ...], name="myarray")

在 d) 和 h) 的情况下,轴会自动命名为x0"、x1"等,除非 mylat、mylon 实际上属于 Axis 类(我在这篇文章中没有提到,但是 Axes 和轴完成他们的工作,构建轴并处理索引).

说明:

class Dimarray(object):""" ndarray 具有有意义的尺寸和干净的界面"""def __init__(self, values, axes, **kwargs):assert isinstance(axes, Axes), "axes 必须是 Axes 的一个实例"self.values = 值self.axes = 轴self.__dict__.update(kwargs)@类方法def from_tuples(cls, values, *args, **kwargs):轴 = Axes.from_tuples(*args)返回 cls(值,轴)@类方法def from_list(cls, values, axes, names=None, **kwargs):如果名称为无:名称 = ["x{}".format(i) for i in range(len(axes))]返回 cls.from_tuples(values, *zip(axes, names), **kwargs)@类方法def from_dict(cls, values, names=None,**kwargs):轴 = Axes.from_dict(shape=values.shape, names=names, **kwargs)# 上面有必要的断言语句返回 cls(值,轴)

这是技巧(示意图):

def dimarray(values, axes=None, names=None, name=..,units=..., **kwargs):""" 我的包装器和所有花​​哨的选项"""如果 len(kwargs) >0:new = Dimarray.from_dict(值,轴,**kwargs)elif 轴 [0] 是元组:new = Dimarray.from_tuples(values, *axes, **kwargs)别的:new = Dimarray.from_list(值,轴,名称=名称,**kwargs)# 保留属性new.set(name=name, units=units, ..., inplace=True)返回新的

我们唯一松动的确实是 *args 语法,它无法容纳这么多选项.不过没关系.

它也让子类化变得容易.对于这里的 Python 专家来说,这听起来如何?

(整个讨论实际上可以分为两部分)

======================

一些背景知识(部分过时,仅适用于情况 a)、b)、c)、d),以防万一您有兴趣:

*选择一个涉及:

def __init__(self, values, axes=None, names=None, units="",name="",..., **kwargs):Dimarray的init方法的"""示意图"""# 根据值的形状自动排序(除非还提供了名称)# 允许用户忘记数组的确切形状如果 len(kwargs) >0:轴 = Axes.from_dict(shape=values.shape, names=names, **kwargs)# 否则从列表初始化# 精确排序 + 更自由的轴命名别的:轴 = Axes.from_list(轴,名称)... # 检查一致性self.values = 值self.axes = 轴self.name = 姓名self.units = 单位

*选择 b) 和 c) 强加:

def __init__(self, values, *args, **kwargs):...

b) 所有属性都通过 kwargs 自然传递,使用 self.__dict__.update(kwargs).这很干净.

c) 需要过滤关键字参数:

def __init__(self, values, *args, **kwargs):""" 最灵活的交互式使用"""# 过滤掉已知属性default_attrs = {'name':'', 'units':'', ...}以 kwargs 为单位的 k:如果 k 在 'name', 'units', ...:setattr(self, k) = kwargs.pop(k)别的:setattr(self, k) = default_attrs[k]# 和之前一样如果 len(kwargs) >0:轴 = Axes.from_dict(shape=values.shape, names=names, **kwargs)# 一样,解压别的:名称,numpy_axes = zip(*args)轴 = Axes.from_list(numpy_axes,名称)

这实际上非常好用,唯一的(次要)缺点是 name=""、units="" 的默认参数和其他一些更相关的参数无法通过检查或完成访问.

*选择 d:清除 __init__

def __init__(self, values, axes, name="", units="", ..., **kwaxes)

但确实有点冗长.

==========

编辑,仅供参考:我最终使用了 axes 参数的元组列表,或者参数 dims=>labels= 分别用于轴名称和轴值.相关项目dimarray在github上.再次感谢 kazagistar.

解决方案

不,您无法知道将项目添加到字典中的顺序,因为这样做会显着增加实现字典的复杂性.(当你真的需要这个时,collections.OrderedDict 有你覆盖了).

但是,您是否考虑过一些基本的替代语法?例如:

a = Dimarray(values, 'lat', lat, 'lon', lon)

或(可能是最好的选择)

a = Dimarray(values, ('lat', lat), ('lon', lon))

或(最明确的)

a = Dimarray(values, [('lat', lat), ('lon', lon)])

不过,在某种程度上,需要排序本质上是位置性的.**kwargs 经常被滥用作为标签,但参数名称通常不应该是数据",因为以编程方式设置很麻烦.只需将数据中与元组相关联的两部分明确,并使用列表来保留排序,并提供强断言+错误消息以明确输入何时无效以及原因.

Retrieving the order of key-word arguments passed via **kwargs would be extremely useful in the particular project I am working on. It is about making a kind of n-d numpy array with meaningful dimensions (right now called dimarray), particularly useful for geophysical data handling.

For now say we have:

import numpy as np
from dimarray import Dimarray   # the handy class I am programming

def make_data(nlat, nlon):
    """ generate some example data
    """
    values = np.random.randn(nlat, nlon)
    lon = np.linspace(-180,180,nlon)
    lat = np.linspace(-90,90,nlat)
    return lon, lat, values

What works:

>>> lon, lat, values = make_data(180,360)
>>> a = Dimarray(values, lat=lat, lon=lon)
>>> print a.lon[0], a.lat[0]
-180.0 -90.0

What does not:

>>> lon, lat, data = make_data(180,180) # square, no shape checking possible !
>>> a = Dimarray(values, lat=lat, lon=lon)
>>> print a.lon[0], a.lat[0] # is random 
-90.0, -180.0  # could be (actually I raise an error in such ambiguous cases)

The reason is that Dimarray's __init__ method's signature is (values, **kwargs) and since kwargs is an unordered dictionary (dict) the best it can do is check against the shape of values.

Of course, I want it to work for any kind of dimensions:

a = Dimarray(values, x1=.., x2=...,x3=...)

so it has to be hard coded with **kwargs The chances of ambiguous cases occurring increases with the number of dimensions. There are ways around that, for instance with a signature (values, axes, names, **kwargs) it is possible to do:

a = Dimarray(values, [lat, lon], ["lat","lon"]) 

but this syntax is cumbersome for interactive use (ipython), since I would like this package to really be a part of my (and others !!) daily use of python, as an actual replacement of numpy arrays in geophysics.

I would be VERY interested in a way around that. The best I can think of right now is to use inspect module's stack method to parse the caller's statement:

import inspect
def f(**kwargs):
    print inspect.stack()[1][4]
    return tuple([kwargs[k] for k in kwargs])

>>> print f(lon=360, lat=180)
[u'print f(lon=360, lat=180)\n']
(180, 360)

>>> print f(lat=180, lon=360)
[u'print f(lat=180, lon=360)\n']
(180, 360)

One could work something out from that, but there are unsolvable issues since stack() catches everything on the line:

>>> print (f(lon=360, lat=180), f(lat=180, lon=360))
[u'print (f(lon=360, lat=180), f(lat=180, lon=360))\n']
[u'print (f(lon=360, lat=180), f(lat=180, lon=360))\n']
((180, 360), (180, 360))

Is there any other inspect trick I am not aware of, which could solve this problem ? (I am not familiar with this module) I would imagine getting the piece of code which is right between the brackets lon=360, lat=180 should be something feasible, no??

So I have the feeling for the first time in python to hit a hard wall in term of doing something which is theoretically feasible based on all available information (the ordering provided by the user IS valuable information !!!).

I read interesting suggestions by Nick there: https://mail.python.org/pipermail/python-ideas/2011-January/009054.html and was wondering whether this idea has moved forward somehow?

I see why it is not desirable to have an ordered **kwargs in general, but a patch for these rare cases would be neat. Anyone aware of a reliable hack?

NOTE: this is not about pandas, I am actually trying to develop a light-weight alternative for it, whose usage remains very close to numpy. Will soon post the gitHub link.

EDIT: Note I this is relevant for interactive use of dimarray. The dual syntax is needed anyway.

EDIT2: I also see counter arguments that knowing the data is not ordered could also be seen as valuable information, since it leaves Dimarray the freedom to check values shape and adjust the order automatically. It could even be that not remembering the dimension of the data occurs more often than having the same size for two dimensions. So right now, I guess it is fine to raise an error for ambiguous cases, asking the user to provide the names argument. Nevertheless, it would be neat to have the freedom to make that kind of choices (how Dimarray class should behave), instead of being constrained by a missing feature of python.

EDIT 3, SOLUTIONS: after the suggestion of kazagistar:

I did not mention that there are other optional attribute parameters such as name="" and units="", and a couple of other parameters related to slicing, so the *args construct would need to come with keyword name testing on kwargs.

In summary, there are many possibilities:

*Choice a: keep current syntax

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [mylat, mylon], ["lat", "lon"], name="myarray")

*Choice b: kazagistar's 2nd suggestion, dropping axis definition via **kwargs

a = Dimarray(values, ("lat", mylat), ("lon",mylon), name="myarray")

*Choice c: kazagistar's 2nd suggestion, with optional axis definition via **kwargs (note this involves names= to be extracted from **kwargs, see background below)

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, ("lat", mylat), ("lon",mylon), name="myarray")

*Choice d: kazagistar's 3nd suggestion, with optional axis definition via **kwargs

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [("lat", mylat), ("lon",mylon)], name="myarray")

Hmm, it comes down to aesthetics, and to some design questions (Is lazy ordering an important feature in interactive mode?). I am hesitating between b) and c). I am not sure the **kwargs really brings something. Ironically enough, what I started to criticize became a feature when thinking more about it...

Thanks very much for the answers. I will mark the question as answered, but you are most welcome to vote for a), b) c) or d) !

=====================

EDIT 4 : better solution: choice a) !!, but adding a from_tuples class method. The reason for that is to allow one more degree of freedom. If the axis names are not provided, they will be generated automatically as "x0", "x1" etc... To use really just like pandas, but with axis naming. This also avoids mixing up axes and attributes into **kwargs, and leaving it only for the axes. There will be more soon as soon as I am done with the doc.

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [mylat, mylon], ["lat", "lon"], name="myarray")
a = Dimarray.from_tuples(values, ("lat", mylat), ("lon",mylon), name="myarray")

EDIT 5 : more pythonic solution? : similar to EDIT 4 above in term of the user api, but via a wrapper dimarray, while being very strict with how Dimarray is instantiated. This is also in the spirit of what kazagistar proposed.

 from dimarray import dimarray, Dimarray 

 a = dimarray(values, lon=mylon, lat=mylat, name="myarray") # error if lon and lat have same size
 b = dimarray(values, [("lat", mylat), ("lon",mylon)], name="myarray")
 c = dimarray(values, [mylat, mylon, ...], ['lat','lon',...], name="myarray")
 d = dimarray(values, [mylat, mylon, ...], name="myarray2")

And from the class itself:

 e = Dimarray.from_dict(values, lon=mylon, lat=mylat) # error if lon and lat have same size
 e.set(name="myarray", inplace=True)
 f = Dimarray.from_tuples(values, ("lat", mylat), ("lon",mylon), name="myarray")
 g = Dimarray.from_list(values, [mylat, mylon, ...], ['lat','lon',...], name="myarray")
 h = Dimarray.from_list(values, [mylat, mylon, ...], name="myarray")

In the cases d) and h) axes are automatically named "x0", "x1", and so on, unless mylat, mylon actually belong to the Axis class (which I do not mention in this post, but Axes and Axis do their job, to build axes and deal with indexing).

Explanations:

class Dimarray(object):
    """ ndarray with meaningful dimensions and clean interface
    """
    def __init__(self, values, axes, **kwargs):
        assert isinstance(axes, Axes), "axes must be an instance of Axes"
        self.values = values
        self.axes = axes
        self.__dict__.update(kwargs)

    @classmethod
    def from_tuples(cls, values, *args, **kwargs):
        axes = Axes.from_tuples(*args)
        return cls(values, axes)

    @classmethod
    def from_list(cls, values, axes, names=None, **kwargs):
        if names is None:
            names = ["x{}".format(i) for i in range(len(axes))]
        return cls.from_tuples(values, *zip(axes, names), **kwargs)

    @classmethod
    def from_dict(cls, values, names=None,**kwargs):
        axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)
        # with necessary assert statements in the above
        return cls(values, axes)

Here is the trick (schematically):

def dimarray(values, axes=None, names=None, name=..,units=..., **kwargs):
    """ my wrapper with all fancy options
    """
    if len(kwargs) > 0:
        new = Dimarray.from_dict(values, axes, **kwargs) 

    elif axes[0] is tuple:
        new = Dimarray.from_tuples(values, *axes, **kwargs) 

    else:
        new = Dimarray.from_list(values, axes, names=names, **kwargs) 

    # reserved attributes
    new.set(name=name, units=units, ..., inplace=True) 

    return new

The only thing we loose is indeed *args syntax, which could not accommodate for so many options. But that's fine.

And its make it easy for sub-classing, too. How does it sound to the python experts here?

(this whole discussion could be split in two parts really)

=====================

A bit of background (EDIT: in part outdated, for cases a), b), c), d) only), just in case you are interested:

*Choice a involves:

def __init__(self, values, axes=None, names=None, units="",name="",..., **kwargs):
    """ schematic representation of Dimarray's init method
    """
    # automatic ordering according to values' shape (unless names is also provided)
    # the user is allowed to forget about the exact shape of the array
    if len(kwargs) > 0:
        axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)

    # otherwise initialize from list
    # exact ordering + more freedom in axis naming 
    else:
        axes = Axes.from_list(axes, names)

    ...  # check consistency

    self.values = values
    self.axes = axes
    self.name = name
    self.units = units         

*Choices b) and c) impose:

def __init__(self, values, *args, **kwargs):
    ...

b) all attributes are naturally passed via kwargs, with self.__dict__.update(kwargs). This is clean.

c) Need to filter key-word arguments:

def __init__(self, values, *args, **kwargs):
   """ most flexible for interactive use
   """
   # filter out known attributes
   default_attrs = {'name':'', 'units':'', ...} 
   for k in kwargs:
       if k in 'name', 'units', ...:
           setattr(self, k) = kwargs.pop(k)
       else:
           setattr(self, k) = default_attrs[k]

   # same as before
   if len(kwargs) > 0:
       axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)

   # same, just unzip
   else:
       names, numpy_axes = zip(*args)
       axes = Axes.from_list(numpy_axes, names)

This is actually quite nice and handy, the only (minor) drawback is that default parameters for name="", units="" and some other more relevant parameters are not accessible by inspection or completion.

*Choice d: clear __init__

def __init__(self, values, axes, name="", units="", ..., **kwaxes)

But is a bit verbose indeed.

==========

EDIT, FYI: I ended up using a list of tuples for the axes parameter, or alternatively the parameters dims= and labels= for axis name and axis values, respectively. The related project dimarray is on github. Thanks again at kazagistar.

解决方案

No, you cannot know the order in which items were added to a dictionary, since doing this increases the complexity of implementing the dicionary significantly. (For when you really really need this, collections.OrderedDict has you covered).

However, have you considered some basic alternative syntax? For example:

a = Dimarray(values, 'lat', lat, 'lon', lon)

or (probably the best option)

a = Dimarray(values, ('lat', lat), ('lon', lon))

or (most explicit)

a = Dimarray(values, [('lat', lat), ('lon', lon)])

At some level though, that need ordering are inherently positional. **kwargs is often abused for labeling, but argument name generally shouldn't be "data", since it is a pain to set programatically. Just make the two parts of the data that are associated clear with a tuple, and use a list to make the ordering preserved, and provide strong assertions + error messages to make it clear when the input is invalid and why.

这篇关于如何检索传递给函数调用的关键字参数的原始顺序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆