使用特定的列值作为关键字在pandas数据框中进行搜索 [英] Use particular column value as key to search in pandas dataframe

查看:196
本文介绍了使用特定的列值作为关键字在pandas数据框中进行搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要获取具有特定列值的行作为键 以下是我的熊猫df.

I need to get rows with particular column value as key Below is my pandas df.

>>> data
     OrderID            TimeStamp  ErrorCode Duration          ResponseType  \
0    3000000  1488948188555841641        NaN      IOC                   NaN   
1    3000000  1488948188556444675          0      NaN     NEW_ORDER_CONFIRM   
2    3000000  1488948188556448153          2      NaN         TRADE_CONFIRM   
3    3000001  1488948658787676012        NaN      IOC                   NaN   
4    3000001  1488948658787811582          1      NaN     NEW_ORDER_CONFIRM   
5    3000001  1488948658787824862          2      NaN         TRADE_CONFIRM   
6    3000002  1488949064945887091        NaN      IOC                   NaN   
7    3000003  1488949109654115659        NaN      IOC                   NaN   
8    3000003  1488949109654294973          1      NaN     NEW_ORDER_CONFIRM   
9    3000003  1488949109654299930      16388      NaN  CANCEL_ORDER_CONFIRM   

我需要选择持续时间为IOC的所有orderID(相当容易) 按照答案orders = data.loc[data.Duration == 'IOC', 'OrderID'].unique()中的说明使用,然后获取持续时间为NaN的那些选定OrderID的行. OrderID将始终为3,或仅为一个ORDERID(无法返回任何输出或空行,例如OrderID 3000002)

I need to select all orderID where Duration is IOC (fairly easy) used as given in answer orders = data.loc[data.Duration == 'IOC', 'OrderID'].unique() and then get the rows for those selected OrderID where duration is NaN. OrderID will always be in 3 or just a single ORDERID (for which no output or null row can be returned, like in case of OrderID 3000002)

棘手的部分是NEW_ORDER_CONFIRM中的错误代码正确,而TRADE_CONFIRM或CANCEL_ORDER_CONFIRM中的错误代码是错误的.我只想在最后一行的输出中得到那些正确的值.

The tricky part is that Errorcode in NEW_ORDER_CONFIRM is correct and the one in TRADE_CONFIRM or CANCEL_ORDER_CONFIRM are WRONG. I just want those correct values in my final row outputs.

EXPECTED O/P ROW 1 
     OrderID            TimeStamp  ErrorCode Duration          ResponseType  \
0    3000000  1488948188555841641        0      IOC                   TRADE_CONFIRM 

我尝试通过使用grep IOC loglife| cut -d, -f1 to get OrderID then grep each OrderID & NaN来使用bash.但是我需要一个效率更高的python解决方案

I tried using bash by using grep IOC loglife| cut -d, -f1 to get OrderID then grep each OrderID & NaN. But I need a python solution which would be much more efficient

推荐答案

我认为您可以首先获取所有

I think you can first get all unique values of column OrderID where is IOC in Duration and then select all NaN by boolean indexing - mask is created by isin with isnull:

#unique can be omit, but then solution a bit slowier in big df 
orders = df.loc[df.Duration == 'IOC', 'OrderID'].unique()

df = df[df.OrderID.isin(orders) & df.Duration.isnull()]
print (df)
   OrderID            TimeStamp  ErrorCode Duration          ResponseType
1  3000000  1488948188556448153        2.0      NaN         TRADE_CONFIRM
3  3000001  1488948658787824862        2.0      NaN         TRADE_CONFIRM
6  3000003  1488949109654299930    16388.0      NaN  CANCEL_ORDER_CONFIRM

这篇关于使用特定的列值作为关键字在pandas数据框中进行搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆