pandas :如何在偏移日期合并两个数据框? [英] Pandas: how to merge two dataframes on offset dates?

查看：118 发布时间：2019/9/19 16:10:17 date join pandas merge offset

本文介绍了 pandas :如何在偏移日期合并两个数据框?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想合并两个数据框df1& df2，基于df2的行是否在df1的行之后的3-6个月的日期范围内.例如:

I'd like to merge two dataframes, df1 & df2, based on whether rows of df2 fall within a 3-6 month date range after rows of df1. For example:

df1(对于我拥有季度数据的每个公司):

df1 (for each company I have quarterly data):

    company DATADATE
0   012345  2005-06-30
1   012345  2005-09-30
2   012345  2005-12-31
3   012345  2006-03-31
4   123456  2005-01-31
5   123456  2005-03-31
6   123456  2005-06-30
7   123456  2005-09-30

df2(对于每家公司，我都有可能在任何一天发生的活动日期):

df2 (for each company I have event dates that can happen on any day):

    company EventDate
0   012345  2005-07-28 <-- won't get merged b/c not within date range
1   012345  2005-10-12
2   123456  2005-05-15
3   123456  2005-05-17
4   123456  2005-05-25
5   123456  2005-05-30
6   123456  2005-08-08
7   123456  2005-11-29
8   abcxyz  2005-12-31 <-- won't be merged because company not in df1

理想的合并df-在df1行中的DATADATE被合并之后，df2中EventDates为3-6个月(即1个季度)的行:

Ideal merged df -- rows with EventDates in df2 that are 3-6 months (i.e. 1 quarter) after DATADATEs in rows of df1 will be merged:

    company DATADATE    EventDate
0   012345  2005-06-30  2005-10-12
1   012345  2005-09-30  NaN   <-- nan because no EventDates fell in this range
2   012345  2005-12-31  NaN
3   012345  2006-03-31  NaN
4   123456  2005-01-31  2005-05-15
5   123456  2005-01-31  2005-05-17
5   123456  2005-01-31  2005-05-25
5   123456  2005-01-31  2005-05-30
6   123456  2005-03-31  2005-08-08
7   123456  2005-06-30  2005-11-19
8   123456  2005-09-30  NaN

我正在尝试应用此相关主题[合并基于pandas DataFrames的在不规则的时间间隔内]，方法是在df1中添加start_time和end_time列，以表示DATADATE之后的3个月(start_time)至6个月(end_time)，然后使用np.searchsorted()，但是这种情况有点棘手，因为我d想逐个公司合并.

I am trying to apply this related topic [ Merge pandas DataFrames based on irregular time intervals ] by adding start_time and end_time columns to df1 denoting 3 months (start_time) to 6 months (end_time) after DATADATE, then using np.searchsorted(), but this case is a bit trickier because I'd like to merge on a company-by-company basis.

推荐答案

这是我的解决方案，与Ami Tavory提出的以下算法无关:

This is my solution going off of the algorithm that Ami Tavory suggested below:

#find the date offsets to define date ranges
start_time = df1.DATADATE.apply(pd.offsets.MonthEnd(3))
end_time = df1.DATADATE.apply(pd.offsets.MonthEnd(6))

#make these extra columns
df1['start_time'] = start_time
df1['end_time'] = end_time

#find unique company names in both dfs
unique_companies_df1 = df1.company.unique()
unique_companies_df2 = df2.company.unique()

#sort df1 by company and DATADATE, so we can iterate in a sensible order
sorted_df1=df1.sort(['company','DATADATE']).reset_index(drop=True)

#define empty df to append data
df3 = pd.DataFrame()

#iterate through each company in df1, find 
#that company in sorted df2, then for each 
#DATADATE quarter of df1, bisect df2 in the 
#correct locations (i.e. start_time to end_time)

for cmpny in unique_companies_df1:

    if cmpny in unique_companies_df2: #if this company is in both dfs, take the relevant rows that are associated with this company 
        selected_df2 = df2[df2.company==cmpny].sort('EventDate').reset_index(drop=True)
        selected_df1 = sorted_df1[sorted_df1.company==cmpny].reset_index(drop=True)

        for quarter in xrange(len(selected_df1.DATADATE)): #iterate through each DATADATE quarter in df1
            lo=bisect.bisect_right(selected_df2.EventDate,selected_CS.start_time[quarter]) #bisect_right to ensure that we do not include dates before our date range
            hi=bisect.bisect_left(selected_IT.EventDate,selected_CS.end_time[quarter]) #bisect_left here to not include dates after our desired date range            
            df_right = selected_df2.loc[lo:hi].copy()  #grab all rows with EventDates that fall within our date range
            df_left = pd.DataFrame(selected_df1.loc[quarter]).transpose()

            if len(df_right)==0: # if no EventDates fall within range, create a row with cmpny in the 'company' column, and a NaT in the EventDate column to merge
                df_right.loc[0,'company']=cmpny

            temp = pd.merge(df_left,df_right,how='inner',on='company') #merge the df1 company quarter with all df2's rows that fell within date range
            df3=df3.append(temp)

这篇关于 pandas :如何在偏移日期合并两个数据框?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas :如何在偏移日期合并两个数据框? [英] Pandas: how to merge two dataframes on offset dates?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas :如何在偏移日期合并两个数据框? [英] Pandas: how to merge two dataframes on offset dates?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭