pd.merge_asof在第二次运行时失败,并显示"ValueError:必须对左键进行排序" [英] pd.merge_asof fails with 'ValueError: left keys must be sorted' on second run

查看:284
本文介绍了pd.merge_asof在第二次运行时失败,并显示"ValueError:必须对左键进行排序"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在最接近的date_times上合并两个数据集.

Hi I'm trying to merge two datasets on the closest matching date_times.

我有两个开放和关闭事件的时间戳.

I have two time stamps for open and closed events.

merge_asof在打开日期运行良好,但在第二个date_time返回"ValueError:必须对左键进行排序" .

The merge_asof runs fine on the open date, but returns 'ValueError: left keys must be sorted' on the second date_time.

我两次都按相关的date_time进行排序.

I sort by the relevant date_time on both occasions.

第一个数据帧:

   idtbl_station_manager     date_time_stamp fld_station_number  \
0                   1121 2017-09-19 15:41:24            AM00571   
1                   1122 2017-09-19 15:41:24            AM00572   
2                   1123 2017-09-19 15:41:24            AM00573   

  fld_grid_number fld_status  fld_station_number_int  \
0     VOY-024-001     CLOSED                     571   
1     VOY-024-002     CLOSED                     572   
2     VOY-024-003     CLOSED                     573   

                  fld_activities date_time_stamp_open fld_lat_open  \
0  Drift Net,CTD-Overside,Dredge  2017-04-13 07:23:35                
1  Drift Net,CTD-Overside,Dredge  2017-04-13 10:15:07   4649.028 S   
2  Drift Net,CTD-Overside,Dredge  2017-04-13 13:15:42   4648.497 S   

  fld_lon_open date_time_stamp_close fld_lat_close fld_lon_close  
0  03759.143 E   2017-04-13 09:51:18    4647.361 S   03759.142 E  
1  03759.143 E   2017-04-13 12:11:00    4647.344 S   03759.143 E  
2                2017-04-13 15:09:26    4647.344 S   03759.143 E  

第二个数据帧:

         idtbl_gpgga     date_time_stamp    fld_utc   fld_lat fld_lat_dir  \
1179828      1179829 2017-04-04 02:00:04  000005.00  3354.138           S   
0                  1 2017-04-04 02:00:05  000006.00  3354.138           S   
1                  2 2017-04-04 02:00:07  000008.00  3354.138           S   

          fld_lon fld_lon_dir fld_gps_quality fld_nos fld_hdop fld_alt  \
1179828  1825.557           E               1      10      0.9    21.6   
0        1825.557           E               1      10      0.9    21.6   
1        1825.557           E               1      10      0.9    21.6   

        fld_unit_alt fld_alt_geoid fld_unit_alt_geoid fld_dgps_age fld_dgps_id  
1179828            M          31.9                  M                        0  
0                  M          31.9                  M                        0  
1                  M          31.9                  M                        0  

这可以按预期工作:

# First we grab the open time lat and lons

# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_open", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)

#merge_asof used to get closest match on datetime
pd_open = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_open'], right_on=['date_time_stamp'], direction="nearest")

pd_open["fld_lat_open"] = pd_open["fld_lat"] + ' ' +  pd_open["fld_lat_dir"]
pd_open["fld_lon_open"] = pd_open["fld_lon"] + ' ' +  pd_open["fld_lon_dir"]     

此操作失败,并显示:

"ValueError:必须对左键进行排序"

'ValueError: left keys must be sorted'

# Now we grab the close time lat and lons

# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_close", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)

#merge_asof used to get closest match on datetime
pd_close = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_close'], right_on=['date_time_stamp'], direction="nearest")

pd_close["fld_lat_close"] = pd_close["fld_lat"] + ' ' +  pd_close["fld_lat_dir"]
pd_close["fld_lat_close"] = pd_close["fld_lon"] + ' ' +  pd_close["fld_lon_dir"]  

任何建议将不胜感激.

推荐答案

如@JohnE所述,df_stationManager数据框中存在NaT值.

As noted by @JohnE, there were NaT values present in the df_stationManager dataframe.

在合并之前通过清洁解决:

Resolved by cleaning before merging:

df_stationManager = df_stationManager.dropna() 

这篇关于pd.merge_asof在第二次运行时失败,并显示"ValueError:必须对左键进行排序"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆