合并日期时间列上的数据(POSIXct 格式) [英] Merging Data on date time column (POSIXct format)
本文介绍了合并日期时间列上的数据(POSIXct 格式)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想合并日期时间列上的两个数据框 dtype.date-time 列包含相似和不同的值.但我无法合并它们,以便所有唯一的日期时间行最终都在那里......在不常见的列中使用 NA.我在第二个数据框的 date_time 列中获取 NA.在 R 和 python 中都尝试过
I want to merge two data frames on Date Time column dtype.date-time columns contain both similar and different values. But I am unable to merge them such that all unique date-time rows are finally there..with NA in uncommon columns. I am getting NAs in date_time column for 2nd data frame. tried both in R and python
df=pd.merge(df_met, df_so2, how='left', on='Date_Time')
在 R..data_type 中是使用 as.POSIXct 的日期时间
df_2<-join(so2, met_km, type="inner")
df3 <- merge(so2, met_km, all = TRUE)
df_4 <- merge(so2, met_km, by.x = "Date_Time", by.y = "Date_Time")
df_so2:
X POC Datum Date_Time Date_GMT Sample.Measurement MDL
1 2 WGS84 2015-01-01 3:00 01/01/2015 09:00 2.3 0.2
2 2 WGS84 2015-01-01 4:00 01/01/2015 10:00 2.5 0.2
3 2 WGS84 2015-01-01 5:00 01/01/2015 11:00 2.1 0.2
4 2 WGS84 2015-01-01 6:00 01/01/2015 12:00 2.3 0.2
5 2 WGS84 2015-01-01 7:00 01/01/2015 13:00 1.1 0.2
df_met:
X Date_Time air_temp_set_1 dew_point_temperature_set_1
1 2015-01-01 1:00 35.6 35.6
2 2015-01-01 2:00 35.6 35.6
3 2015-01-01 3:00 35.6 35.6
4 2015-01-01 4:00 33.8 33.8
5 2015-01-01 5:00 33.2 33.2
6 2015-01-01 6:00 33.8 33.8
7 2015-01-01 7:00 33.8 33.8
预期输出:
X POC Datum Date_Time Date_GMT Sample.Measurement MDL
1 1.0 2 WGS84 2015-01-01 3:00 01/01/2015 09:00 2.3 0.2
2 2.0 2 WGS84 2015-01-01 4:00 01/01/2015 10:00 2.5 0.2
3 NaN NaN 2015-01-01 1:00 NaN NaN NaN
4 NaN NaN 2015-01-01 2:00 NaN NaN NaN
推荐答案
merge on outer 应该得到他们所有:
pandas.DataFrame.合并
outer
:使用来自两个帧的键的并集,类似于 SQL 全外连接;按字典顺序对键进行排序.- 根据您的评论,您需要所有日期,而不仅仅是
预期输出
中显示的日期 - 添加
参数
,sort=True
,如果你想让它们按date
排序
pandas.DataFrame.merge
outer
: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.- based upon your comment, you want all the dates, not just those shown in
Expected Output
- add the
parameter
,sort=True
if you want them sorted bydate
merge on outer should get them all:
df_exp = pd.merge(df_so2, df_met, on='Date_Time', how='outer')
X_x POC Datum Date_Time Date_GMT Sample.Measurement MDL X_y air_temp_set_1 dew_point_temperature_set_1
1.0 2.0 WGS84 2015-01-01 3:00 01/01/2015 09:00 2.3 0.2 3 35.6 35.6
2.0 2.0 WGS84 2015-01-01 4:00 01/01/2015 10:00 2.5 0.2 4 33.8 33.8
3.0 2.0 WGS84 2015-01-01 5:00 01/01/2015 11:00 2.1 0.2 5 33.2 33.2
4.0 2.0 WGS84 2015-01-01 6:00 01/01/2015 12:00 2.3 0.2 6 33.8 33.8
5.0 2.0 WGS84 2015-01-01 7:00 01/01/2015 13:00 1.1 0.2 7 33.8 33.8
NaN NaN NaN 2015-01-01 1:00 NaN NaN NaN 1 35.6 35.6
NaN NaN NaN 2015-01-01 2:00 NaN NaN NaN 2 35.6 35.6
没有来自 df_met
的列:
df_exp.drop(columns=['X_y', 'air_temp_set_1', 'dew_point_temperature_set_1'], inplace=True)
df_exp.rename(columns={'X_x': 'X'}, inplace=True)
X POC Datum Date_Time Date_GMT Sample.Measurement MDL
1.0 2.0 WGS84 2015-01-01 3:00 01/01/2015 09:00 2.3 0.2
2.0 2.0 WGS84 2015-01-01 4:00 01/01/2015 10:00 2.5 0.2
3.0 2.0 WGS84 2015-01-01 5:00 01/01/2015 11:00 2.1 0.2
4.0 2.0 WGS84 2015-01-01 6:00 01/01/2015 12:00 2.3 0.2
5.0 2.0 WGS84 2015-01-01 7:00 01/01/2015 13:00 1.1 0.2
NaN NaN NaN 2015-01-01 1:00 NaN NaN NaN
NaN NaN NaN 2015-01-01 2:00 NaN NaN NaN
这篇关于合并日期时间列上的数据(POSIXct 格式)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文