合并数据框对象和timedelta64 [英] Merge dataframe object and timedelta64

查看:46
本文介绍了合并数据框对象和timedelta64的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个dtype datetime64的数据框

I have a dataframe of dtype datetime64

df:
time           timestamp
18053.401736   2019-06-06 09:38:30+00:00
18053.418252   2019-06-06 10:02:17+00:00
18053.424514   2019-06-06 10:11:18+00:00
18053.454132   2019-06-06 10:53:57+00:00
Name: timestamp, dtype: datetime64[ns, UTC]

和一系列dtype timedelta64

and a Series of dtype timedelta64

ss:
         ref_time
       0 days 09:00:00
1       0 days 09:00:01
2       0 days 09:00:02
3       0 days 09:00:03
4       0 days 09:00:04
              ...      
21596   0 days 14:59:56
21597   0 days 14:59:57
21598   0 days 14:59:58
21599   0 days 14:59:59
21600   0 days 15:00:00
Name: timeonly, Length: 21601, dtype: timedelta64[ns]

我想将两者合并,以使输出df仅在时间戳与Series之一重合时才具有值:

I want to merge the two so that the output df have values only where timestamp coincide with the one of the Series:

Desired output:
time           timestamp                     ref_time
Nan            Nan                           09:00:00
...            ...                           ...
Nan            Nan                           09:38:29
18053.401736   2019-06-06 09:38:30+00:00     09:38:30
Nan            Nan                           09:38:31
...            ...                           ...
18053.418252   2019-06-06 10:02:17+00:00     10:02:17
Nan            Nan                           10:02:18
Nan            Nan                           10:02:19
...            ...                           ...
18053.424514   2019-06-06 10:11:18+00:00     10:11:18
...            ...                           ...
18053.454132   2019-06-06 10:53:57+00:00     10:53:57

但是,如果我将时间戳"转换为仅时间,则会得到对象dtype,并且无法将其与ss合并.

However if I convert 'timestamp' to a time-only I get an object dtype and I can't merge it with ss.

dframe['timestamp'].dtype        # --> datetime64[ns, UTC]
df['timeonly'] = df['timestamp'].dt.time 
df['timeonly'].dtype             # --> object
df_date.merge(timeax, how='outer', on=['timeonly'])
# ValueError: You are trying to merge on object and timedelta64[ns] columns. If you wish to proceed you should use pd.concat

,但是按建议使用concat不能给我想要的输出.如何合并/加入DataFrame和Series?熊猫1.1.5版

but using concat as suggested doesn't give me the desired output. How can I merge/join the DataFrame and the Series? Pandas version 1.1.5

推荐答案

此处的主要挑战是将所有内容转换为兼容的日期类型.使用经过稍微修改的示例作为输入

The main challenge here is to get everything into compatible date types. Using your, slightly modified, examples as inputs

from io import StringIO
df = pd.read_csv(StringIO(
"""
time,timestamp
18053.401736,2019-06-06 09:38:30+00:00
18053.418252,2019-06-06 10:02:17+00:00
18053.424514,2019-06-06 10:11:18+00:00
18053.454132,2019-06-06 10:53:57+00:00
"""))
df['timestamp'] = pd.to_datetime(df['timestamp'])

from datetime import timedelta
sdf = pd.read_csv(StringIO(
"""
ref_time
0 days 09:00:00
0 days 09:00:01
0 days 09:00:02
0 days 09:00:03
0 days 09:00:04
0 days 09:38:30
0 days 10:02:17
0 days 14:59:56
0 days 14:59:57
0 days 14:59:58
0 days 14:59:59
0 days 15:00:00
"""))
sdf['ref_time'] = pd.to_timedelta(sdf['ref_time'])

这里的dtype与您的问题很重要

The dtypes here are as in your question which is important

首先,我们需要计算出 base_date ,因为我们需要将timedelta转换为datetimes等.请注意,我们通过 round('1d')将其设置为相关日期的午夜

First we figure out the base_date as we need to convert timedeltas into datetimes etc. Note we set it to midnight of the relevant date via round('1d')

base_date = df['timestamp'].iloc[0].round('1d').to_pydatetime()
base_date

输出

datetime.datetime(2019, 6, 6, 0, 0, tzinfo=<UTC>)

接下来,我们将 sdf 中的时间增量添加到base_date:

Next we add timedeltas from sdf to the base_date:

sdf['ref_dt'] = sdf['ref_time'] + base_date

现在 sdf ['ref_dt'] df ['timestamp'] 位于相同的单位"和相同的类型中,因此我们可以合并

Now sdf['ref_dt'] and df['timestamp'] are in the same 'units' and of the same type, so we can merge

sdf.merge(df, left_on = 'ref_dt', right_on = 'timestamp', how = 'left')

输出

    ref_time         ref_dt                        time  timestamp
--  ---------------  -------------------------  -------  -------------------------
 0  0 days 09:00:00  2019-06-06 09:00:00+00:00    nan    NaT
 1  0 days 09:00:01  2019-06-06 09:00:01+00:00    nan    NaT
 2  0 days 09:00:02  2019-06-06 09:00:02+00:00    nan    NaT
 3  0 days 09:00:03  2019-06-06 09:00:03+00:00    nan    NaT
 4  0 days 09:00:04  2019-06-06 09:00:04+00:00    nan    NaT
 5  0 days 09:38:30  2019-06-06 09:38:30+00:00  18053.4  2019-06-06 09:38:30+00:00
 6  0 days 10:02:17  2019-06-06 10:02:17+00:00  18053.4  2019-06-06 10:02:17+00:00
 7  0 days 14:59:56  2019-06-06 14:59:56+00:00    nan    NaT
 8  0 days 14:59:57  2019-06-06 14:59:57+00:00    nan    NaT
 9  0 days 14:59:58  2019-06-06 14:59:58+00:00    nan    NaT
10  0 days 14:59:59  2019-06-06 14:59:59+00:00    nan    NaT
11  0 days 15:00:00  2019-06-06 15:00:00+00:00    nan    NaT

我们看到合并发生在需要的地方

and we see the merge happening where needed

这篇关于合并数据框对象和timedelta64的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆