将数据帧分为两部分,并使用代字号〜作为变量 [英] Splitting dataframe into two and using tilde ~ as variable

查看:91
本文介绍了将数据帧分为两部分,并使用代字号〜作为变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在Python 3中使用Pandas进行2个类似的操作。
一个带波浪号,另一个不带波浪号。

I wanna do 2 similar operations with Pandas in Python 3. One with tilde and another without tilde.

1 - df = df[~(df.teste.isin(["Place"]))] 
2 - df = df[(df.teste.isin(["Place"]))]

我试图将波浪号声明为变量,所以我只能写一行,然后决定是否要使用有或没有波浪号。但这不起作用:

I tried to declare the tilde as variable, so I could write just one line and then decide if I wanna use with or without tilde. But it doesn't work:

tilde = ["~", ""]
df = df[tilde[0](df.teste.isin(["Place"]))]

可能有什么可以减少我的代码的?因为我只是交换波浪号而写了许多相等的行...

Is possible do something that could reduce my code? Cause I am writing many equal lines just exchanging the tilde...

谢谢!

为什么我想要波浪号作为变量:

Why I wanna the tilde as variable:

def server_latam(df):
    df.rename(columns={'Computer:OSI':'OSI'}, inplace=True) 
    df = df[~(df.teste.isin(["Place"]))]

    df1 = df.loc[df.model != 'Virtual Platform', 'model'].count()
    print("LATAM")
    print("Physical Servers: ",df1)
    df2 = df.loc[df.model == 'Virtual Platform', 'model'].count()
    print("Virtual Servers: ",df2)
    df3 = df.groupby('platformName').size().reset_index(name='by OS: ')
    print(df3)

def server_latam_without_tilde(df):
    df.rename(columns={'Computer:OSI':'OSI'}, inplace=True) 
    df = df[(df.teste.isin(["Place"]))]

    df1 = df.loc[df.model != 'Virtual Platform', 'model'].count()
    print("LATAM")
    print("Physical Servers: ",df1)
    df2 = df.loc[df.model == 'Virtual Platform', 'model'].count()
    print("Virtual Servers: ",df2)
    df3 = df.groupby('platformName').size().reset_index(name='by OS: ')
    print(df3)

在每个函数的第二行中,波浪号出现。

In the second line of each function the tilde appears.

推荐答案

对于有限的用例,所要求的收益有限。

For your limited use case, there is limited benefit in what you are requesting.

您的 real 问题是您必须创建的变量数。您可以通过 GroupBy 和计算出的石斑鱼将它们减半:

Your real problem, however, is the number of variables you are having to create. You can halve them via GroupBy and a calculated grouper:

df = pd.DataFrame({'teste': ['Place', 'Null', 'Something', 'Place'],
                   'value': [1, 2, 3, 4]})

dfs = dict(tuple(df.groupby(df['teste'] == 'Place')))

{False:        teste  value
        1       Null      2
        2  Something      3,

 True:         teste  value
            0  Place      1
            3  Place      4}

然后通过 dfs [0] dfs [1] ,因为 False == 0 True == 1 。最后一个示例 有一个好处。现在,您无需不必要地创建新变量。您的数据帧由于存在于同一词典中而得以组织。

Then access your dataframes via dfs[0] and dfs[1], since False == 0 and True == 1. There is a benefit with this last example. You now remove the need to create new variables unnecessarily. Your dataframes are organized since they exist in the same dictionary.

您的精确要求可以通过 operator 模块和一个身份函数来满足

Your precise requirement can be met via the operator module and an identity function:

from operator import invert

tilde = [invert, lambda x: x]

mask = df.teste == 'Place'  # don't repeat mask calculations unnecessarily

df1 = df[tilde[0](mask)]
df2 = df[tilde[1](mask)]



序列拆包



如果您打算使用一行,请使用序列拆包:

Sequence unpacking

If your intention is to use one line, use sequence unpacking:

df1, df2 = (df[func(mask)] for func in tilde)

请注意,您可以通过以下方式复制 GroupBy 结果:

Note you can replicate the GroupBy result via:

dfs = dict(enumerate(df[func(mask)] for func in tilde)

但这是冗长而令人费解的。坚持使用 GroupBy 解决方案。

But this is verbose and convoluted. Stick with the GroupBy solution.

这篇关于将数据帧分为两部分,并使用代字号〜作为变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆