合并数据框对象和timedelta64 [英] Merge dataframe object and timedelta64
问题描述
我有一个dtype datetime64的数据框
I have a dataframe of dtype datetime64
df:
time timestamp
18053.401736 2019-06-06 09:38:30+00:00
18053.418252 2019-06-06 10:02:17+00:00
18053.424514 2019-06-06 10:11:18+00:00
18053.454132 2019-06-06 10:53:57+00:00
Name: timestamp, dtype: datetime64[ns, UTC]
和一系列dtype timedelta64
and a Series of dtype timedelta64
ss:
ref_time
0 days 09:00:00
1 0 days 09:00:01
2 0 days 09:00:02
3 0 days 09:00:03
4 0 days 09:00:04
...
21596 0 days 14:59:56
21597 0 days 14:59:57
21598 0 days 14:59:58
21599 0 days 14:59:59
21600 0 days 15:00:00
Name: timeonly, Length: 21601, dtype: timedelta64[ns]
我想将两者合并,以使输出df仅在时间戳与Series之一重合时才具有值:
I want to merge the two so that the output df have values only where timestamp coincide with the one of the Series:
Desired output:
time timestamp ref_time
Nan Nan 09:00:00
... ... ...
Nan Nan 09:38:29
18053.401736 2019-06-06 09:38:30+00:00 09:38:30
Nan Nan 09:38:31
... ... ...
18053.418252 2019-06-06 10:02:17+00:00 10:02:17
Nan Nan 10:02:18
Nan Nan 10:02:19
... ... ...
18053.424514 2019-06-06 10:11:18+00:00 10:11:18
... ... ...
18053.454132 2019-06-06 10:53:57+00:00 10:53:57
但是,如果我将时间戳"转换为仅时间,则会得到对象dtype,并且无法将其与ss合并.
However if I convert 'timestamp' to a time-only I get an object dtype and I can't merge it with ss.
dframe['timestamp'].dtype # --> datetime64[ns, UTC]
df['timeonly'] = df['timestamp'].dt.time
df['timeonly'].dtype # --> object
df_date.merge(timeax, how='outer', on=['timeonly'])
# ValueError: You are trying to merge on object and timedelta64[ns] columns. If you wish to proceed you should use pd.concat
,但是按建议使用concat不能给我想要的输出.如何合并/加入DataFrame和Series?熊猫1.1.5版
but using concat as suggested doesn't give me the desired output. How can I merge/join the DataFrame and the Series? Pandas version 1.1.5
推荐答案
此处的主要挑战是将所有内容转换为兼容的日期类型.使用经过稍微修改的示例作为输入
The main challenge here is to get everything into compatible date types. Using your, slightly modified, examples as inputs
from io import StringIO
df = pd.read_csv(StringIO(
"""
time,timestamp
18053.401736,2019-06-06 09:38:30+00:00
18053.418252,2019-06-06 10:02:17+00:00
18053.424514,2019-06-06 10:11:18+00:00
18053.454132,2019-06-06 10:53:57+00:00
"""))
df['timestamp'] = pd.to_datetime(df['timestamp'])
from datetime import timedelta
sdf = pd.read_csv(StringIO(
"""
ref_time
0 days 09:00:00
0 days 09:00:01
0 days 09:00:02
0 days 09:00:03
0 days 09:00:04
0 days 09:38:30
0 days 10:02:17
0 days 14:59:56
0 days 14:59:57
0 days 14:59:58
0 days 14:59:59
0 days 15:00:00
"""))
sdf['ref_time'] = pd.to_timedelta(sdf['ref_time'])
这里的dtype与您的问题很重要
The dtypes here are as in your question which is important
首先,我们需要计算出 base_date
,因为我们需要将timedelta转换为datetimes等.请注意,我们通过 round('1d')
将其设置为相关日期的午夜
First we figure out the base_date
as we need to convert timedeltas into datetimes etc. Note we set it to midnight of the relevant date via round('1d')
base_date = df['timestamp'].iloc[0].round('1d').to_pydatetime()
base_date
输出
datetime.datetime(2019, 6, 6, 0, 0, tzinfo=<UTC>)
接下来,我们将 sdf
中的时间增量添加到base_date:
Next we add timedeltas from sdf
to the base_date:
sdf['ref_dt'] = sdf['ref_time'] + base_date
现在 sdf ['ref_dt']
和 df ['timestamp']
位于相同的单位"和相同的类型中,因此我们可以合并>
Now sdf['ref_dt']
and df['timestamp']
are in the same 'units' and of the same type, so we can merge
sdf.merge(df, left_on = 'ref_dt', right_on = 'timestamp', how = 'left')
输出
ref_time ref_dt time timestamp
-- --------------- ------------------------- ------- -------------------------
0 0 days 09:00:00 2019-06-06 09:00:00+00:00 nan NaT
1 0 days 09:00:01 2019-06-06 09:00:01+00:00 nan NaT
2 0 days 09:00:02 2019-06-06 09:00:02+00:00 nan NaT
3 0 days 09:00:03 2019-06-06 09:00:03+00:00 nan NaT
4 0 days 09:00:04 2019-06-06 09:00:04+00:00 nan NaT
5 0 days 09:38:30 2019-06-06 09:38:30+00:00 18053.4 2019-06-06 09:38:30+00:00
6 0 days 10:02:17 2019-06-06 10:02:17+00:00 18053.4 2019-06-06 10:02:17+00:00
7 0 days 14:59:56 2019-06-06 14:59:56+00:00 nan NaT
8 0 days 14:59:57 2019-06-06 14:59:57+00:00 nan NaT
9 0 days 14:59:58 2019-06-06 14:59:58+00:00 nan NaT
10 0 days 14:59:59 2019-06-06 14:59:59+00:00 nan NaT
11 0 days 15:00:00 2019-06-06 15:00:00+00:00 nan NaT
我们看到合并发生在需要的地方
and we see the merge happening where needed
这篇关于合并数据框对象和timedelta64的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!