通过dict传递参数或理解列表python pandas选择数据帧 [英] Select data frame by dict passing in parameters or comprehension list python pandas

查看:85
本文介绍了通过dict传递参数或理解列表python pandas选择数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想选择通过字典或理解列表的数据框中的行。

I want to select rows in a dataframe passing a dict or a comprehension list.

我有一个包含数百万行的数据框,我想创建一个函数来选择该数据框的一部分,只与参数列表相对应。为了使其复杂化,我必须传递数据框和列表,但是该列表可以包含NaN值和 0。因此,我必须删除该条目以选择适当的行。

I have a data frame with millions rows, I want to create a function to select just a part of this data frame corresponding to a list of parameters. To complexity it, I must pass the data frame and the list but this list can contain NaN values and '0'. So I must delete this entry to select the proper rows.

条目列表:

b = ['MUSTANG', 'Coupé', '0', np.nan, np.nan]

     AGE    KM     Brand   Model           Liter     Bodycar    Power
0    2.0  10000.0  FORD    MUSTANG          5.0        Coupé    421
1    2.0  10000.0  FORD    MUSTANG          5.0        Coupé    421
2    5.0  10400.0  FORD    MUSTANG          5.0        Coupé    421
3    5.0  10400.0  FORD    MUSTANG          5.0        Coupé    421
4   16.0  20700.0  FORD    MUSTANG          3.7        Coupé    317
5    7.0  23300.0  FORD    MUSTANG          3.7                 317
6    7.0  23300.0  FORD    MUSTANG          2.3        Coupé    301
7    7.0  23300.0  FORD    MUSTANG          5.0                 421
...

I started a function to remove the part of the list useless and try to select the proper rows but failed... 

  def func_mcclbp_incomp(df, mcclbp):
     ind = []

     mcclbp = [i if type(i) == str else '0' for i in mcclbp]
     ind = [i for i, x in enumerate(mcclbp) if x=='0']

     head = ['Brand','Model','Bodycar','Liter', 'Power']
     mmcclbp = {head[0]:mcclbp[0], head[1]:mcclbp[1], head[2]:mcclbp[2], \
             head[3]:mcclbp[3], head[4]:mcclbp[4]}
     for i in ind:
         del mmcclbp[head[i]]
     df = df[df[head[i]==mccblp[i]] for i in mmcclbp.key()]
     return df

我尝试过理解列表,但熊猫给我发了一个错误:

I tried a comprehension list but pandas send me an error :

File "<ipython-input-235-6f78e45f59d4>", line 1
df = df[df[head[i].isin(mccblp[i]) for i in mmcclbp.keys()]]
                                     ^
SyntaxError: invalid syntax

当我试图通过一个字典,我有一个KeyError。

When I tried passing a dict I have a KeyError.

如果我使用b,则所需的输出为:

The output needed if I use b is :

     AGE    KM     Brand   Model           Liter     Bodycar    Power
0    2.0  10000.0  FORD    MUSTANG          5.0        Coupé    421
1    2.0  10000.0  FORD    MUSTANG          5.0        Coupé    421
2    5.0  10400.0  FORD    MUSTANG          5.0        Coupé    421
3    5.0  10400.0  FORD    MUSTANG          5.0        Coupé    421
4   16.0  20700.0  FORD    MUSTANG          3.7        Coupé    317
6    7.0  23300.0  FORD    MUSTANG          2.3        Coupé    301

如果我将b更改为另一个值,例如:

If I change b to another value like :

b = ['FORD', 'MUSTANG', 'Coupé', '3.7', '317']

结果将是:

     AGE    KM     Brand   Model           Liter     Bodycar    Power
4   16.0  20700.0  FORD    MUSTANG          3.7        Coupé    317

有些一个人知道如何自动选择列出对应的行吗?

Someone knows how I can select list corresponding rows automatically ?

谢谢您的回答,

Chris。

推荐答案

您可以使用 dict 通过 DataFrame.all 检查每一行的所有 True 值是否为掩码并通过 布尔值索引

还必须转换所有 DataFrame string s by astype ,因为所有<$ c $ dict 的c>值也是 string s:

You can use dict for filtering with DataFrame.all for check all True values per row for mask and filter by boolean indexing.
Also is necessary convert all values of DataFrame to strings by astype, because all values of dict are strings too:

d = {'Brand':'FORD', 'Model':'MUSTANG', 'Bodycar':'Coupé', 'Liter':'3.7', 'Power':'317'}

print (df.astype(str)[list(d)] == pd.Series(d))
   Bodycar  Brand  Liter  Model  Power
0     True   True  False   True  False
1     True   True  False   True  False
2     True   True  False   True  False
3     True   True  False   True  False
4     True   True   True   True   True
6     True   True  False   True  False

mask = (df.astype(str)[list(d)] == pd.Series(d)).all(axis=1)
print (mask)
0    False
1    False
2    False
3    False
4     True
6    False
dtype: bool

df1 = df[mask]
print (df1)
    AGE       KM Brand    Model  Liter Bodycar  Power
4  16.0  20700.0  FORD  MUSTANG    3.7   Coupé    317

这篇关于通过dict传递参数或理解列表python pandas选择数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆