比较 pandas 数据框行和删除日期重叠的行 [英] Comparing Pandas Dataframe Rows & Dropping rows with overlapping dates

查看:74
本文介绍了比较 pandas 数据框行和删除日期重叠的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,里面充满了从交易策略中提取的交易.交易策略中的逻辑需要更新,以确保如果该策略已经在交易中,则不会进行交易-但这是另一个问题.许多先前交易的交易数据都从csv文件读取到数据框中.

I have a dataframe filled with trades taken from a trading strategy. The logic in the trading strategy needs to be updated to ensure that trade isn't taken if the strategy is already in a trade - but that's a different problem. The trade data for many previous trades is read into a dataframe from a csv file.

这是我所拥有的数据的问题: 我需要对数据帧进行逐行比较,以确定rowX的Entrydate是否小于ExitDate rowX-1.

Here's my problem for the data I have: I need to do a row-by-row comparison of the dataframe to determine if Entrydate of rowX is less than ExitDate rowX-1.

我的数据样本:

Row 1:
EntryDate  ExitDate
2012-07-25 2012-07-27 

Row 2:
EntryDate  ExitDate
2012-07-26 2012-07-29

第2行需要删除,因为这是不应该发生的交易.

Row 2 needs to be deleted because it is a trade that should not have occurred.

我在确定哪些行是重复项然后删除它们时遇到了麻烦.我很幸运地尝试了此问题的答案3的方法,但这并不理想,因为我必须手动遍历数据框并读取每一行的数据我当前的方法在下面,而且很难看.我检查日期,然后将它们添加到新的数据框中.此外,这种方法在最终数据帧中为我提供了多个重复项.

I'm having trouble identifying which rows are duplicates and then dropping them. I tried the approach in answer 3 of this question with some luck but it isn't ideal because I have to manually iterate through the dataframe and read each row's data. My current approach is below and is ugly as can be. I check the dates, and then add them to a new dataframe. Additionally, this approach gives me multiple duplicates in the final dataframe.

for i in range(0,len(df)+1):
    if i+1 == len(df): break #to keep from going past last row
    ExitDate = df['ExitDate'].irow(i)
    EntryNextTrade = df['EntryDate'].irow(i+1)

    if EntryNextTrade>ExitDate: 
        line={'EntryDate':EntryDate,'ExitDate':ExitDate}
        df_trades=df_trades.append(line,ignore_index=True)

关于如何更有效地完成此操作的任何想法或想法?

Any thoughts or ideas on how to more efficiently accomplish this?

您可以单击此处,以查看我的数据样本想尝试重现我的实际数据框.

You can click here to see a sampling of my data if you want to try to reproduce my actual dataframe.

推荐答案

您应该使用某种布尔型掩码来执行这种操作.

You should use some kind of boolean mask to do this kind of operation.

一种方法是为下一次交易创建一个虚拟列:

One way is to create a dummy column for the next trade:

df['EntryNextTrade'] = df['EntryDate'].shift()

使用它来创建遮罩:

msk = df['EntryNextTrade'] > df'[ExitDate']

并使用loc查看msk为True的subDataFrame,并且仅查看指定的列:

And use loc to look at the subDataFrame where msk is True, and only the specified columns:

df.loc[msk, ['EntryDate', 'ExitDate']]

这篇关于比较 pandas 数据框行和删除日期重叠的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆