将数据帧分为两部分,并使用代字号〜作为变量 [英] Splitting dataframe into two and using tilde ~ as variable
问题描述
我想在Python 3中使用Pandas进行2个类似的操作。
一个带波浪号,另一个不带波浪号。
I wanna do 2 similar operations with Pandas in Python 3. One with tilde and another without tilde.
1 - df = df[~(df.teste.isin(["Place"]))]
2 - df = df[(df.teste.isin(["Place"]))]
我试图将波浪号声明为变量,所以我只能写一行,然后决定是否要使用有或没有波浪号。但这不起作用:
I tried to declare the tilde as variable, so I could write just one line and then decide if I wanna use with or without tilde. But it doesn't work:
tilde = ["~", ""]
df = df[tilde[0](df.teste.isin(["Place"]))]
可能有什么可以减少我的代码的?因为我只是交换波浪号而写了许多相等的行...
Is possible do something that could reduce my code? Cause I am writing many equal lines just exchanging the tilde...
谢谢!
为什么我想要波浪号作为变量:
Why I wanna the tilde as variable:
def server_latam(df):
df.rename(columns={'Computer:OSI':'OSI'}, inplace=True)
df = df[~(df.teste.isin(["Place"]))]
df1 = df.loc[df.model != 'Virtual Platform', 'model'].count()
print("LATAM")
print("Physical Servers: ",df1)
df2 = df.loc[df.model == 'Virtual Platform', 'model'].count()
print("Virtual Servers: ",df2)
df3 = df.groupby('platformName').size().reset_index(name='by OS: ')
print(df3)
def server_latam_without_tilde(df):
df.rename(columns={'Computer:OSI':'OSI'}, inplace=True)
df = df[(df.teste.isin(["Place"]))]
df1 = df.loc[df.model != 'Virtual Platform', 'model'].count()
print("LATAM")
print("Physical Servers: ",df1)
df2 = df.loc[df.model == 'Virtual Platform', 'model'].count()
print("Virtual Servers: ",df2)
df3 = df.groupby('platformName').size().reset_index(name='by OS: ')
print(df3)
在每个函数的第二行中,波浪号出现。
In the second line of each function the tilde appears.
推荐答案
对于有限的用例,所要求的收益有限。
For your limited use case, there is limited benefit in what you are requesting.
您的 real 问题是您必须创建的变量数。您可以通过 GroupBy
和计算出的石斑鱼将它们减半:
Your real problem, however, is the number of variables you are having to create. You can halve them via GroupBy
and a calculated grouper:
df = pd.DataFrame({'teste': ['Place', 'Null', 'Something', 'Place'],
'value': [1, 2, 3, 4]})
dfs = dict(tuple(df.groupby(df['teste'] == 'Place')))
{False: teste value
1 Null 2
2 Something 3,
True: teste value
0 Place 1
3 Place 4}
然后通过 dfs [0]
和 dfs [1] $ c $访问数据框c>,因为
False == 0
和 True == 1
。最后一个示例 有一个好处。现在,您无需不必要地创建新变量。您的数据帧由于存在于同一词典中而得以组织。
Then access your dataframes via dfs[0]
and dfs[1]
, since False == 0
and True == 1
. There is a benefit with this last example. You now remove the need to create new variables unnecessarily. Your dataframes are organized since they exist in the same dictionary.
您的精确要求可以通过 operator
模块和一个身份函数来满足:
Your precise requirement can be met via the operator
module and an identity function:
from operator import invert
tilde = [invert, lambda x: x]
mask = df.teste == 'Place' # don't repeat mask calculations unnecessarily
df1 = df[tilde[0](mask)]
df2 = df[tilde[1](mask)]
序列拆包
如果您打算使用一行,请使用序列拆包:
Sequence unpacking
If your intention is to use one line, use sequence unpacking:
df1, df2 = (df[func(mask)] for func in tilde)
请注意,您可以通过以下方式复制 GroupBy
结果:
Note you can replicate the GroupBy
result via:
dfs = dict(enumerate(df[func(mask)] for func in tilde)
但这是冗长而令人费解的。坚持使用 GroupBy
解决方案。
But this is verbose and convoluted. Stick with the GroupBy
solution.
这篇关于将数据帧分为两部分,并使用代字号〜作为变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!