仅使用传递的参数子集创建命名元组对象 [英] Creating a namedtuple object using only a subset of arguments passed

查看:36
本文介绍了仅使用传递的参数子集创建命名元组对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从 MySQL 数据库中提取行作为字典(使用 SSDictCursor)并使用以下方法进行一些处理:

from collections import namedtupleclass Foo(namedtuple('Foo', ['id', 'name', 'age'])):__插槽__ = ()def __init__(self, *args):super(Foo, self).__init__(self, *args)# ...下面的一些类方法class Bar(namedtuple('Bar', ['id', 'address', 'city', 'state']):__插槽__ = ()def __init__(self, *args):super(Bar, self).__init__(self, *args)# 这里有一些类方法...# 更多用于不同处理任务的类...

要使用namedtuple,我必须事先确切知道我想要的字段,这很好.但是,我希望允许用户将一个简单的 SELECT * 语句提供给我的程序,然后该程序将遍历结果集的行,使用这些不同的类执行多个任务.为了完成这项工作,我的类必须以某种方式检查来自游标的 N 个字段,并且只取特定的子集 M <N 对应于 namedtuple 定义所期望的名称.

我的第一个想法是尝试编写一个可以应用于我的每个类的装饰器,它会检查类以查看它期望的字段,并仅将适当的参数传递给新对象.但我最近几天才开始阅读关于装饰器的文章,我对它们还没有那么自信.

所以我的问题分为两部分:

  1. 这是否可以用单个装饰器来完成,它会找出被装饰的特定类需要哪些字段?
  2. 是否有具有相同功能且更易于使用、修改和理解的替代方案?

我有太多潜在的表和字段排列,每个结果集中有数百万行,我只需要编写一个通用的 namedtuple 子类来处理每个不同的任务.查询时间和可用内存已被证明是限制因素.

如果需要:

<预><代码>>>>系统版本'2.7.5(默认,2013 年 5 月 15 日,22:43:36)[MSC v.1500 32 位(英特尔)]'

解决方案

首先,您必须覆盖 __new__ 以自定义 namedtuple 创建,因为 namedtuple__new__ 方法会在你到达 __init__ 之前检查它的参数.

其次,如果您的目标是接受和过滤关键字参数,您需要使用 **kwargs 并过滤并传递它,而不仅仅是 *args.

所以,把它放在一起:

class Foo(namedtuple('Foo', ['id', 'name', 'age'])):__插槽__ = ()def __new__(cls, *args, **kwargs):kwargs = {k: v for k, v in kwargs.items() if k in cls._fields}return super(Foo, cls).__new__(cls, *args, **kwargs)

<小时>

你可以用 itemgetter 替换那个 dict 理解,但是每次我使用带有多个键的 itemgetter 时,没有人理解它的意思,所以我不情愿地停止使用它.

<小时>

如果有理由,您也可以覆盖 __init__,因为它会在 __new__ 返回 Foo 时立即调用实例.

但是你不需要仅仅为了这个,因为namedtuple的__init__不接受任何参数或做任何事情;值已经在 __new__ 中设置(就像 tuple 和其他不可变类型一样).看起来在 CPython 2.7 中,你实际上 可以 super(Foo, self).__init__(*args, **kwargs) 它只会被忽略,但是PyPy 1.9 和 CPython 3.3,你会得到一个 TypeError.无论如何,没有理由通过它们,也没有说它应该起作用,所以即使在 CPython 2.7 中也不要这样做.

请注意,您 __init__ 将获得未过滤的 kwargs.如果你想改变它,你可以在 __new__ 中就地改变 kwargs,而不是制作一个新字典.但我相信这仍然不能保证做任何事情;它只是使其实现定义是获取过滤的参数还是未过滤的参数,而不是保证未过滤的参数.

<小时>

那么,你能总结一下吗?当然!

def LenientNamedTuple(name, fields):类包装器(命名元组(名称,字段)):__插槽__ = ()def __new__(cls, *args, **kwargs):args = args[:len(fields)]kwargs = {k: v for k, v in kwargs.items() if k in fields}return super(Wrapper, cls).__new__(cls, *args, **kwargs)返回包装器

请注意,这样做的优点是不必使用准私有/半文档化的 _fields 类属性,因为我们已经将 fields 作为参数.

此外,当我们在做的时候,我添加了一行来丢弃任何多余的位置参数,正如评论中所建议的那样.

<小时>

现在您只需像使用 namedtuple 一样使用它,它就会自动忽略任何多余的参数:

class Foo(LenientNamedTuple('Foo', ['id', 'name', 'age'])):经过打印(Foo(id=1, name=2, age=3, spam=4))

打印(Foo(1, 2, 3, 4, 5))打印(Foo(1,年龄=3,姓名=2,鸡蛋=4))

<小时>

我上传了一个测试,用dict()替换了dict理解在用于 2.6 兼容性的基因表达式上(2.6 是带有 namedtuple 的最早版本),但没有截断 args.它适用于 CPython 2.6.7、2.7.2、2.7.5、3.2.3、3.3.0 和 3.3.1、PyPy 1.9.0 中的位置、关键字和混合参数,包括乱序关键字和 2.0b1,以及 Jython 2.7b.

I am pulling rows from a MySQL database as dictionaries (using SSDictCursor) and doing some processing, using the following approach:

from collections import namedtuple

class Foo(namedtuple('Foo', ['id', 'name', 'age'])):
    __slots__ = ()

    def __init__(self, *args):
        super(Foo, self).__init__(self, *args)

    # ...some class methods below here

class Bar(namedtuple('Bar', ['id', 'address', 'city', 'state']):
    __slots__ = ()

    def __init__(self, *args):
        super(Bar, self).__init__(self, *args)

    # some class methods here...

# more classes for distinct processing tasks...

To use namedtuple, I have to know exactly the fields I want beforehand, which is fine. However, I would like to allow the user to feed a simple SELECT * statement into my program, which will then iterate through the rows of the result set, performing multiple tasks using these different classes. In order to make this work, my classes have to somehow examine the N fields coming in from the cursor and take only the particular subset M < N corresponding to the names expected by the namedtuple definition.

My first thought was to try writing a single decorator that I could apply to each of my classes, which would examine the class to see what fields it was expecting, and pass only the appropriate arguments to the new object. But I've just started reading about decorators in the past few days, and I'm not that confident yet with them.

So my question is in two parts:

  1. Is this possible to do with a single decorator, that will figure out which fields are needed by the specific class being decorated?
  2. Is there an alternative with the same functionality that will be easier to use, modify and understand?

I have too many potential permutations of tables and fields, with millions of rows in each result set, to just write one all-purpose namedtuple subclass to deal with each different task. Query time and available memory have proven to be limiting factors.

If needed:

>>> sys.version
'2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)]'

解决方案

First, you have to override __new__ in order to customize namedtuple creation, because a namedtuple's __new__ method checks its arguments before you even get to __init__.

Second, if your goal is to accept and filter keyword arguments, you need to take **kwargs and filter and pass that through, not just *args.

So, putting it together:

class Foo(namedtuple('Foo', ['id', 'name', 'age'])):
    __slots__ = ()

    def __new__(cls, *args, **kwargs):
        kwargs = {k: v for k, v in kwargs.items() if k in cls._fields}
        return super(Foo, cls).__new__(cls, *args, **kwargs)


You could replace that dict comprehension with itemgetter, but every time I use itemgetter with multiple keys, nobody understands what it means, so I've reluctantly stopped using it.


You can also override __init__ if you have a reason to do so, because it will be called as soon as __new__ returns a Foo instance.

But you don't need to just for this, because the namedtuple's __init__ doesn't take any arguments or do anything; the values have already been set in __new__ (just as with tuple, and other immutable types). It looks like with CPython 2.7, you actually can super(Foo, self).__init__(*args, **kwargs) and it'll just be ignored, but with PyPy 1.9 and CPython 3.3, you get a TypeError. At any rate, there's no reason to pass them, and nothing saying it should work, so don't do it even in CPython 2.7.

Note that you __init__ will get the unfiltered kwargs. If you want to change that, you could mutate kwargs in-place in __new__, instead of making a new dictionary. But I believe that still isn't guaranteed to do anything; it just makes it implementation-defined whether you get the filtered args or unfiltered, instead of guaranteeing the unfiltered.


So, can you wrap this up? Sure!

def LenientNamedTuple(name, fields):
    class Wrapper(namedtuple(name, fields)):
        __slots__ = ()
        def __new__(cls, *args, **kwargs):
            args = args[:len(fields)]
            kwargs = {k: v for k, v in kwargs.items() if k in fields}
            return super(Wrapper, cls).__new__(cls, *args, **kwargs)
    return Wrapper

Note that this has the advantage of not having to use the quasi-private/semi-documented _fields class attribute, because we already have fields as a parameter.

Also, while we're at it, I added a line to toss away any excess positional arguments, as suggested in a comment.


Now you just use it as you'd use namedtuple, and it automatically ignores any excess arguments:

class Foo(LenientNamedTuple('Foo', ['id', 'name', 'age'])):
    pass

print(Foo(id=1, name=2, age=3, spam=4))

    print(Foo(1, 2, 3, 4, 5))     print(Foo(1, age=3, name=2, eggs=4))


I've uploaded a test, replacing the dict comprehension with dict() on a genexpr for 2.6 compatibility (2.6 is the earliest version with namedtuple), but without the args truncating. It works with positional, keyword, and mixed args, including out-of-order keywords, in CPython 2.6.7, 2.7.2, 2.7.5, 3.2.3, 3.3.0, and 3.3.1, PyPy 1.9.0 and 2.0b1, and Jython 2.7b.

这篇关于仅使用传递的参数子集创建命名元组对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆