将面膜涂到多行上(语法糖吗?) [英] Apply a mask to multiple lines (syntactic sugar?)

查看:65
本文介绍了将面膜涂到多行上(语法糖吗?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种优雅(或更优雅)的方式来编写numpy中的特定用例.用例是一个大型数据集(因此效率很重要),其中包含100多个字段,超过1000行代码以及多个代码段,我仅想处理其中一部分字段.只要我正在处理所有观察结果,在普通的numpy中这是干净且有效的:

I'm looking for an elegant (or more elegant) way to code a particular use case in numpy. The use case is a large data set (so efficiency matters) with 100+ fields, over 1,000 lines of code, and multiple sections of code where I would like to process only a subset of the fields. As long as I'm processing all observations, this is clean and efficient in plain numpy:

wages = np.arange(40000,60000,2000)
cg    = np.arange(0,100000,10000)
ded   = np.repeat([6000,9000],5) 
exem  = np.repeat([2000,4000],5) 

agi  = wages + cg
tinc = agi - ded
tinc = tinc - exem

但是在许多代码小节中,我只想处理观察的一部分,例如30行代码,这是我能想到的最好的方法:

But in many code subsections I want to process only a subset of the observations for, say, 30 lines of code and this is the best I can come up with:

agi  = wages + cg
mask = wages < 50001
tinc = agi

tinc[mask] = agi[mask] - ded[mask]
tinc[mask] = tinc[mask] - exem[mask]

这并不可怕,不要误会我的意思,而是将其乘以100的变量和几百行代码.有什么方法可以执行以下操作,而无需使用cython/numba循环?

That's not horrible, don't get me wrong, but multiply this by 100s of variables and hundreds of lines of code. Is there any way to do something like the following, without resorting to cython/numba loops?

# fake code, just to convey the desired syntax
agi  = wages + cg
tinc = agi

mask( wages < 50001 ):    # i.e. inside a python loop, would be "if wages < 50001:"
   tinc = agi - ded
   tinc = tinc - exem

换句话说,我想定义代码的各个子节,并指定完全相同的掩码应应用于代码节中的每个单个数组,而不必为每个单独的数组显式键入掩码.

In other words, I'd like to define subsections of code and specify that the exact same mask should apply to every single array within the code section, without explicitly typing the mask for every singly array.

(顺便说一句,我知道可能有一些通过熊猫的替代方法,但是现在我更愿意通过numpy探索我的最佳选择.稍后我可能会用熊猫标签重新询问这个问题.)

(Btw, I know there may be some alternative approaches via pandas, but for now would prefer to explore my best option via numpy. I may re-ask this question with a pandas tag later on.)

推荐答案

我不推荐这样做,但是……您可以使用可怕的魔术上下文管理器来做到这一点.例如:

I'm not recommending this, but… you could do it with a horribly magic context manager. For example:

@contextlib.contextmanager
def masking(namespace, mask):
    # If you don't have a fixed set of maskable variables, make it
    # an instance/global/local variables, like `_names`, or just
    # [name for name, value in namespace.items() if isiinstance(value, np.ndarray)]
    names = 'tinc agi dec exem'.split()
    stash = {name: namespace[name] for name in names}
    for name in names:
        namespace[name] = namespace[name][mask]
    try:
        yield
    finally:
        for name in names:
            namespace[name] = stash[name]

现在您可以执行以下操作:

Now you can do this:

with masking(globals(), wages < 50001):
    tinc = agi - dec
    tinc = tinc - exem

with masking(self.__dict__, self.wages < 50001):
    self.tinc = self.agi - self.dec
    self.tinc = self.tinc - self.exem

# etc.

这篇关于将面膜涂到多行上(语法糖吗?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆