在 pandas 数据框中使用查询 [英] using query in a pandas dataframe

查看:72
本文介绍了在 pandas 数据框中使用查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有下表:
原始数据框

我添加了一个名为 status的列,这是对(gender,senior_management),基本上是对:[(Female,True),(Male,True),(Male,False),...]等。
假设我正在寻找某些条件,因此我定义了列表:

I add a column called "status" which is the pairs (gender, senior_management), which is basically the pairs: [ (Female, True), (Male, True), (Male, False), ...] and so on. suppose I am looking for certain conditions, so I defined the list:

conditions = [(Female, True), (Male, False)]

我现在的目标是使用查询来创建一个仅具有条件值的新数据框。我目前有(请注意MyDataframe是旧的,我正在尝试将其保存为新的同时保留旧的):

my goal is now to use query to make a new data frame that has only values that have condition. I currently have (note that MyDataframe is the old one and I'm trying to save it as a new one while keeping the old one):

NewDataFrame = MyDataFrame.query('status in @conditions')
NewDataframe.head()

仅此结果在数据框的列名称中:
Flawed_result
这是怎么回事以及如何解决?

this only results in the column names of the data frame: Flawed_result What is happening here? and how do I fix it?

推荐答案

似乎是状态列的类型为字符串,因为当您使用格式时,会将表达式转换为字符串,因此它永远不会与 conditions 元组列表匹配,因此您可以尝试将条件列表定义为字符串:

It seems like the status column it's of type string, because when you use format you cast the expression to a string, so it will not match never with the conditions list of tuples, so you can try to define the conditions list as strings:

import pandas as pd

df=pd.DataFrame({'gender':['Male','Female','Male','Female'],'Senior':[True,True,False,False]})
df['status']=df.apply(lambda row: "({},{})".format(row['gender'],row['Senior']), axis=1)
df
#   gender  Senior          status
#0    Male    True     (Male,True)
#1  Female    True   (Female,True)
#2    Male   False    (Male,False)
#3  Female   False  (Female,False)
conditions = ['(Female,True)', '(Male,False)']

df.query('status in @conditions')

输出:

   gender  Senior         status
1  Female    True  (Female,True)
2    Male   False   (Male,False)




如果您要使用元组而不是字符串,则可以尝试以获取元组,然后进行查询


If you want to have as a tuple instead of a string, you could try this to get the tuples, and then make the query

df=pd.DataFrame({'gender':['Male','Female','Male','Female'],'Senior':[True,True,False,False]}) 

df['status']=list(zip(df.gender, df.Senior))

conditions = [('Female',True), ('Male',False)]
df.query('status in @conditions')

这篇关于在 pandas 数据框中使用查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆