“管道"使用Python中缀语法从一个函数输出到另一个函数 [英] "Piping" output from one function to another using Python infix syntax

查看:72
本文介绍了“管道"使用Python中缀语法从一个函数输出到另一个函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用R从R复制 dplyr 程序包Python/熊猫(作为学习练习).我遇到的问题是管道"功能.

I'm trying to replicate, roughly, the dplyr package from R using Python/Pandas (as a learning exercise). Something I'm stuck on is the "piping" functionality.

在R/dplyr中,这是使用管道运算符%>%完成的,其中x %>% f(y)等效于f(x, y).如果可能的话,我想使用infix语法复制此内容(请参见此处).

In R/dplyr, this is done using the pipe-operator %>%, where x %>% f(y) is equivalent to f(x, y). If possible, I would like to replicate this using infix syntax (see here).

为说明起见,请考虑以下两个功能.

To illustrate, consider the two functions below.

import pandas as pd

def select(df, *args):
    cols = [x for x in args]
    df = df[cols]
    return df

def rename(df, **kwargs):
    for name, value in kwargs.items():
        df = df.rename(columns={'%s' % name: '%s' % value})
    return df

第一个函数采用一个数据框并仅返回给定的列.第二个采用数据框,并重命名给定的列.例如:

The first function takes a dataframe and returns only the given columns. The second takes a dataframe, and renames the given columns. For example:

d = {'one' : [1., 2., 3., 4., 4.],
     'two' : [4., 3., 2., 1., 3.]}

df = pd.DataFrame(d)

# Keep only the 'one' column.
df = select(df, 'one')

# Rename the 'one' column to 'new_one'.
df = rename(df, one = 'new_one')

要使用管道/中缀语法实现相同的功能,代码应为:

To achieve the same using pipe/infix syntax, the code would be:

df = df | select('one') \
        | rename(one = 'new_one')

因此,|左侧的输出将作为第一个参数传递到右侧的函数.每当我看到完成此操作(例如,此处)时,它都会涉及到lambda函数.是否可以以相同的方式在函数之间传递Pandas的数据帧?

So the output from the left-hand side of | gets passed as the first argument to the function on the right-hand side. Whenever I see something like this done (here, for example) it involves lambda functions. Is it possible to pipe a Pandas' dataframe between functions in the same manner?

我知道Pandas具有.pipe方法,但是对我来说重要的是我提供的示例的语法.任何帮助,将不胜感激.

I know Pandas has the .pipe method, but what's important to me is the syntax of the example I provided. Any help would be appreciated.

推荐答案

很难使用按位or运算符来实现,因为pandas.DataFrame实现了它.如果您不介意将|替换为>>,则可以尝试以下操作:

It is hard to implement this using the bitwise or operator because pandas.DataFrame implements it. If you don't mind replacing | with >>, you can try this:

import pandas as pd

def select(df, *args):
    cols = [x for x in args]
    return df[cols]


def rename(df, **kwargs):
    for name, value in kwargs.items():
        df = df.rename(columns={'%s' % name: '%s' % value})
    return df


class SinkInto(object):
    def __init__(self, function, *args, **kwargs):
        self.args = args
        self.kwargs = kwargs
        self.function = function

    def __rrshift__(self, other):
        return self.function(other, *self.args, **self.kwargs)

    def __repr__(self):
        return "<SinkInto {} args={} kwargs={}>".format(
            self.function, 
            self.args, 
            self.kwargs
        )

df = pd.DataFrame({'one' : [1., 2., 3., 4., 4.],
                   'two' : [4., 3., 2., 1., 3.]})

然后您可以执行以下操作:

Then you can do:

>>> df
   one  two
0    1    4
1    2    3
2    3    2
3    4    1
4    4    3

>>> df = df >> SinkInto(select, 'one') \
            >> SinkInto(rename, one='new_one')
>>> df
   new_one
0        1
1        2
2        3
3        4
4        4

在Python 3中,您可以滥用unicode:

In Python 3 you can abuse unicode:

>>> print('\u01c1')
ǁ
>>> ǁ = SinkInto
>>> df >> ǁ(select, 'one') >> ǁ(rename, one='new_one')
   new_one
0        1
1        2
2        3
3        4
4        4

[更新]

感谢您的回复.是否可以为每个函数创建一个单独的类(例如SinkInto),以避免不得不将函数作为参数传递?

Thanks for your response. Would it be possible to make a separate class (like SinkInto) for each function to avoid having to pass the functions as an argument?

装饰器怎么样?

def pipe(original):
    class PipeInto(object):
        data = {'function': original}

        def __init__(self, *args, **kwargs):
            self.data['args'] = args
            self.data['kwargs'] = kwargs

        def __rrshift__(self, other):
            return self.data['function'](
                other, 
                *self.data['args'], 
                **self.data['kwargs']
            )

    return PipeInto


@pipe
def select(df, *args):
    cols = [x for x in args]
    return df[cols]


@pipe
def rename(df, **kwargs):
    for name, value in kwargs.items():
        df = df.rename(columns={'%s' % name: '%s' % value})
    return df

现在,您可以装饰任何将DataFrame作为第一个参数的函数:

Now you can decorate any function that takes a DataFrame as the first argument:

>>> df >> select('one') >> rename(one='first')
   first
0      1
1      2
2      3
3      4
4      4

Python很棒!

我知道像Ruby这样的语言表现力很强",以至于鼓励人们将每个程序编写为新的DSL,但这在Python中是一种皱眉.许多Python专家认为将出于不同目的的运算符重载视为罪恶亵渎.

Python is awesome!

I know that languages like Ruby are "so expressive" that it encourages people to write every program as new DSL, but this is kind of frowned upon in Python. Many Pythonists consider operator overloading for a different purpose as a sinful blasphemy.

用户OHLÁLÁ没有留下深刻的印象:

User OHLÁLÁ is not impressed:

此解决方案的问题是当您尝试调用该函数而不是管道时. –OHLÁLÁ

The problem with this solution is when you are trying to call the function instead of piping. – OHLÁLÁ

您可以实现dunder-call方法:

You can implement the dunder-call method:

def __call__(self, df):
    return df >> self

然后:

>>> select('one')(df)
   one
0  1.0
1  2.0
2  3.0
3  4.0
4  4.0

好像不容易讨好OHLÁLÁ:

Looks like it is not easy to please OHLÁLÁ:

在这种情况下,您需要显式调用该对象:
select('one')(df)有办法避免这种情况吗? –OHLÁLÁ

In that case you need to call the object explicitly:
select('one')(df) Is there a way to avoid that? – OHLÁLÁ

好吧,我可以想到一个解决方案,但有一个警告:您的原始函数一定不能采用第二个位置参数,该参数是pandas数据框(可以使用关键字参数).让我们在docorator内的PipeInto类中添加一个__new__方法,以测试第一个参数是否为数据帧,如果是,则只需使用以下参数调用原始函数即可:

Well, I can think of a solution but there is a caveat: your original function must not take a second positional argument that is a pandas dataframe (keyword arguments are ok). Lets add a __new__ method to our PipeInto class inside the docorator that tests if the first argument is a dataframe, and if it is then we just call the original function with the arguments:

def __new__(cls, *args, **kwargs):
    if args and isinstance(args[0], pd.DataFrame):
        return cls.data['function'](*args, **kwargs)
    return super().__new__(cls)

这似乎可行,但可能有些缺点我无法发现.

It seems to work but probably there is some downside I was unable to spot.

>>> select(df, 'one')
   one
0  1.0
1  2.0
2  3.0
3  4.0
4  4.0

>>> df >> select('one')
   one
0  1.0
1  2.0
2  3.0
3  4.0
4  4.0

这篇关于“管道"使用Python中缀语法从一个函数输出到另一个函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆