通过dict传递参数或理解列表python pandas选择数据帧 [英] Select data frame by dict passing in parameters or comprehension list python pandas
问题描述
我想选择通过字典或理解列表的数据框中的行。
I want to select rows in a dataframe passing a dict or a comprehension list.
我有一个包含数百万行的数据框,我想创建一个函数来选择该数据框的一部分,只与参数列表相对应。为了使其复杂化,我必须传递数据框和列表,但是该列表可以包含NaN值和 0。因此,我必须删除该条目以选择适当的行。
I have a data frame with millions rows, I want to create a function to select just a part of this data frame corresponding to a list of parameters. To complexity it, I must pass the data frame and the list but this list can contain NaN values and '0'. So I must delete this entry to select the proper rows.
条目列表:
b = ['MUSTANG', 'Coupé', '0', np.nan, np.nan]
AGE KM Brand Model Liter Bodycar Power
0 2.0 10000.0 FORD MUSTANG 5.0 Coupé 421
1 2.0 10000.0 FORD MUSTANG 5.0 Coupé 421
2 5.0 10400.0 FORD MUSTANG 5.0 Coupé 421
3 5.0 10400.0 FORD MUSTANG 5.0 Coupé 421
4 16.0 20700.0 FORD MUSTANG 3.7 Coupé 317
5 7.0 23300.0 FORD MUSTANG 3.7 317
6 7.0 23300.0 FORD MUSTANG 2.3 Coupé 301
7 7.0 23300.0 FORD MUSTANG 5.0 421
...
I started a function to remove the part of the list useless and try to select the proper rows but failed...
def func_mcclbp_incomp(df, mcclbp):
ind = []
mcclbp = [i if type(i) == str else '0' for i in mcclbp]
ind = [i for i, x in enumerate(mcclbp) if x=='0']
head = ['Brand','Model','Bodycar','Liter', 'Power']
mmcclbp = {head[0]:mcclbp[0], head[1]:mcclbp[1], head[2]:mcclbp[2], \
head[3]:mcclbp[3], head[4]:mcclbp[4]}
for i in ind:
del mmcclbp[head[i]]
df = df[df[head[i]==mccblp[i]] for i in mmcclbp.key()]
return df
我尝试过理解列表,但熊猫给我发了一个错误:
I tried a comprehension list but pandas send me an error :
File "<ipython-input-235-6f78e45f59d4>", line 1
df = df[df[head[i].isin(mccblp[i]) for i in mmcclbp.keys()]]
^
SyntaxError: invalid syntax
当我试图通过一个字典,我有一个KeyError。
When I tried passing a dict I have a KeyError.
如果我使用b,则所需的输出为:
The output needed if I use b is :
AGE KM Brand Model Liter Bodycar Power
0 2.0 10000.0 FORD MUSTANG 5.0 Coupé 421
1 2.0 10000.0 FORD MUSTANG 5.0 Coupé 421
2 5.0 10400.0 FORD MUSTANG 5.0 Coupé 421
3 5.0 10400.0 FORD MUSTANG 5.0 Coupé 421
4 16.0 20700.0 FORD MUSTANG 3.7 Coupé 317
6 7.0 23300.0 FORD MUSTANG 2.3 Coupé 301
如果我将b更改为另一个值,例如:
If I change b to another value like :
b = ['FORD', 'MUSTANG', 'Coupé', '3.7', '317']
结果将是:
AGE KM Brand Model Liter Bodycar Power
4 16.0 20700.0 FORD MUSTANG 3.7 Coupé 317
有些一个人知道如何自动选择列出对应的行吗?
Someone knows how I can select list corresponding rows automatically ?
谢谢您的回答,
Chris。
推荐答案
您可以使用 dict
通过 DataFrame.all
检查每一行的所有 True
值是否为掩码并通过 布尔值索引
。
还必须转换所有 DataFrame
到 string
s by astype
,因为所有<$ c $ dict
的c>值也是 string
s:
You can use dict
for filtering with DataFrame.all
for check all True
values per row for mask and filter by boolean indexing
.
Also is necessary convert all values of DataFrame
to string
s by astype
, because all values
of dict
are string
s too:
d = {'Brand':'FORD', 'Model':'MUSTANG', 'Bodycar':'Coupé', 'Liter':'3.7', 'Power':'317'}
print (df.astype(str)[list(d)] == pd.Series(d))
Bodycar Brand Liter Model Power
0 True True False True False
1 True True False True False
2 True True False True False
3 True True False True False
4 True True True True True
6 True True False True False
mask = (df.astype(str)[list(d)] == pd.Series(d)).all(axis=1)
print (mask)
0 False
1 False
2 False
3 False
4 True
6 False
dtype: bool
df1 = df[mask]
print (df1)
AGE KM Brand Model Liter Bodycar Power
4 16.0 20700.0 FORD MUSTANG 3.7 Coupé 317
这篇关于通过dict传递参数或理解列表python pandas选择数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!