在 pandas 的两个特定日期时间范围之间出现数字 [英] Occurrence of a number between two specific datetime ranges in Pandas

查看:103
本文介绍了在 pandas 的两个特定日期时间范围之间出现数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个CSV文件,如下所示.

I have 2 CSV files, as below.

  1. 我想要一个新列Difference,在这里...
    • 如果手机号码出现在Book_date ... App_date的日期范围内:Difference =差额App_dateOccur_date
    • 或NaN(如果在该日期范围内未出现).
  1. I want a new column Difference, where...
    • if a mobile number appears within the date range of Book_date...App_date: Difference = difference App_date and Occur_date
    • or NaN if it doesn't occur in that date range.

csv_1

Mobile_Number    Book_Date       App_Date

503477334    2018-10-12       2018-10-18
506002884    2018-10-12       2018-10-19
501022162    2018-10-12       2018-10-16
503487338    2018-10-13       2018-10-13
506012887    2018-10-13       2018-10-21
503427339    2018-10-14       2018-10-17

csv_2

Mobile_Number    Occur_Date    

503477334        2018-10-16
506002884        2018-10-21
501022162        2018-10-15
503487338        2018-10-13
501428449        2018-10-18
506012887        2018-10-14

我想在csv_1中添加一个新列,如果移动电话号码出现在csv_2中Book_date和App_date的日期范围内,则App_date与Occur_date或NaN之间的差异(如果不在该日期范围内出现).输出应为

I want a new column in csv_1, where if a mobile number appears within the date range of Book_date and App_date in csv_2, the difference between App_date and the Occur_date or NaN if it doesn't occur in that date range. The output should be

输出

Mobile_Number    Book_Date       App_Date   Difference

503477334    2018-10-12       2018-10-18       2
506002884    2018-10-12       2018-10-19      -2
501022162    2018-10-12       2018-10-16       1
503487338    2018-10-13       2018-10-13       0
506012887    2018-10-13       2018-10-21       7 
503427339    2018-10-14       2018-10-17       NaN

如果我想根据上述两个csv文件上的唯一类别和mobile_number对其进行过滤.怎么做?

If I want to filter it based on a unique category and mobile_number on the above two csv files. How to do the same?

csv_1

Category     Mobile_Number   Book_Date       App_Date

A              503477334    2018-10-12       2018-10-18
B              503477334    2018-10-07       2018-10-16
C              501022162    2018-10-12       2018-10-16
A              503487338    2018-10-13       2018-10-13
C              506012887    2018-10-13       2018-10-21
E              503427339    2018-10-14       2018-10-17

csv_2

Category     Mobile_Number    Occur_Date    

A              503477334        2018-10-16
B              503477334        2018-10-13
A              501022162        2018-10-15
A              503487338        2018-10-13
F              501428449        2018-10-18
C              506012887        2018-10-14

我希望根据Mobile_Number和Category过滤输出

I want the output to be filtered based on the Mobile_Number and the Category

输出

Category     Mobile_Number    Book_Date       App_Date   Difference

A              503477334    2018-10-12       2018-10-18       2
B              503477334    2018-10-07       2018-10-16       3
C              501022162    2018-10-12       2018-10-16       NaN
A              503487338    2018-10-13       2018-10-13       0
C              506012887    2018-10-13       2018-10-21       7 
E              503427339    2018-10-14       2018-10-17       NaN

推荐答案

使用 numpy.where :

Use Series.map for new Series matched by Mobile_Number and for test values between columns use Series.between, then assign values by mask with numpy.where:

df1['Book_Date'] = pd.to_datetime(df1['Book_Date'])
df1['App_Date'] = pd.to_datetime(df1['App_Date'])
df2['Occur_Date'] = pd.to_datetime(df2['Occur_Date'])

s1 = df2.drop_duplicates('Mobile_Number').set_index('Mobile_Number')['Occur_Date']
s2 = df1['Mobile_Number'].map(s1)

m = s2.between(df1['Book_Date'], df1['App_Date'])

#solution with no mask
df1['Difference1'] = df1['App_Date'].sub(s2).dt.days
#solution with test between
df1['Difference2'] = np.where(m, df1['App_Date'].sub(s2).dt.days, np.nan)
print (df1)
   Mobile_Number  Book_Date   App_Date Difference  Difference1  Difference2
0      503477334 2018-10-12 2018-10-18 2018-10-16          2.0          2.0
1      506002884 2018-10-12 2018-10-19 2018-10-21         -2.0          NaN
2      501022162 2018-10-12 2018-10-16 2018-10-15          1.0          1.0
3      503487338 2018-10-13 2018-10-13 2018-10-13          0.0          0.0
4      506012887 2018-10-13 2018-10-21 2018-10-14          7.0          7.0
5      503427339 2018-10-14 2018-10-17        NaT          NaN          NaN

您可以使用merge代替map通过2列进行联接:

You can use merge instead map for join by 2 columns:

df1['Book_Date'] = pd.to_datetime(df1['Book_Date'])
df1['App_Date'] = pd.to_datetime(df1['App_Date'])
df2['Occur_Date'] = pd.to_datetime(df2['Occur_Date'])

df3 = df1.merge(df2, on=['Category','Mobile_Number'], how='left')
print (df3)
  Category  Mobile_Number  Book_Date   App_Date Occur_Date
0        A      503477334 2018-10-12 2018-10-18 2018-10-16
1        B      503477334 2018-10-07 2018-10-16 2018-10-13
2        C      501022162 2018-10-12 2018-10-16        NaT
3        A      503487338 2018-10-13 2018-10-13 2018-10-13
4        C      506012887 2018-10-13 2018-10-21 2018-10-14
5        E      503427339 2018-10-14 2018-10-17        NaT

m = df3['Occur_Date'].between(df3['Book_Date'], df3['App_Date'])
#print (m)

df3['Difference2'] = np.where(m, df3['App_Date'].sub(df3['Occur_Date']).dt.days, np.nan)
print (df3)
  Category  Mobile_Number  Book_Date   App_Date Occur_Date  Difference2
0        A      503477334 2018-10-12 2018-10-18 2018-10-16          2.0
1        B      503477334 2018-10-07 2018-10-16 2018-10-13          3.0
2        C      501022162 2018-10-12 2018-10-16        NaT          NaN
3        A      503487338 2018-10-13 2018-10-13 2018-10-13          0.0
4        C      506012887 2018-10-13 2018-10-21 2018-10-14          7.0
5        E      503427339 2018-10-14 2018-10-17        NaT          NaN

这篇关于在 pandas 的两个特定日期时间范围之间出现数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆