在 pandas 数据框中使用查询 [英] using query in a pandas dataframe
问题描述
假设我有下表:
原始数据框
我添加了一个名为 status的列,这是对(gender,senior_management),基本上是对:[(Female,True),(Male,True),(Male,False),...]等。
假设我正在寻找某些条件,因此我定义了列表:
I add a column called "status" which is the pairs (gender, senior_management), which is basically the pairs: [ (Female, True), (Male, True), (Male, False), ...] and so on. suppose I am looking for certain conditions, so I defined the list:
conditions = [(Female, True), (Male, False)]
我现在的目标是使用查询来创建一个仅具有条件值的新数据框。我目前有(请注意MyDataframe是旧的,我正在尝试将其保存为新的同时保留旧的):
my goal is now to use query to make a new data frame that has only values that have condition. I currently have (note that MyDataframe is the old one and I'm trying to save it as a new one while keeping the old one):
NewDataFrame = MyDataFrame.query('status in @conditions')
NewDataframe.head()
仅此结果在数据框的列名称中:
Flawed_result
这是怎么回事以及如何解决?
this only results in the column names of the data frame: Flawed_result What is happening here? and how do I fix it?
推荐答案
似乎是状态
列的类型为字符串,因为当您使用格式时,会将表达式转换为字符串,因此它永远不会与 conditions
元组列表匹配,因此您可以尝试将条件
列表定义为字符串:
It seems like the status
column it's of type string, because when you use format you cast the expression to a string, so it will not match never with the conditions
list of tuples, so you can try to define the conditions
list as strings:
import pandas as pd
df=pd.DataFrame({'gender':['Male','Female','Male','Female'],'Senior':[True,True,False,False]})
df['status']=df.apply(lambda row: "({},{})".format(row['gender'],row['Senior']), axis=1)
df
# gender Senior status
#0 Male True (Male,True)
#1 Female True (Female,True)
#2 Male False (Male,False)
#3 Female False (Female,False)
conditions = ['(Female,True)', '(Male,False)']
df.query('status in @conditions')
输出:
gender Senior status
1 Female True (Female,True)
2 Male False (Male,False)
如果您要使用元组而不是字符串,则可以尝试此以获取元组,然后进行查询
If you want to have as a tuple instead of a string, you could try this to get the tuples, and then make the query
df=pd.DataFrame({'gender':['Male','Female','Male','Female'],'Senior':[True,True,False,False]})
df['status']=list(zip(df.gender, df.Senior))
conditions = [('Female',True), ('Male',False)]
df.query('status in @conditions')
这篇关于在 pandas 数据框中使用查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!