python将具有不同时间戳和相同列名称的不同日期数据帧连接在一起 [英] python join different days dataframes with different timestamps and same column names

查看:39
本文介绍了python将具有不同时间戳和相同列名称的不同日期数据帧连接在一起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有来自26个不同电厂的数据,以及来自该领域的阳光传感器数据。每个传感器创建一天的数据框。我有14天的数据。因此,我实际上有26x14 = 364个单独的数据帧。最后,我想将所有这些组合在一起。下面我给出了一个示例数据:

I have data coming from different 26 power plants and sunlight sensors data in the field. Each sensor creates a data frame for a day. I have data of 14 days. So, I actually have 26x14=364 individual dataframes. Finally, I want to combine all of them into one. Below I have given a sample data:

## Data coming in form of dataframes from field sensors 
sundf
ptA_d1df ### PlantA_Day1Dataframe
ptB_d1df
ptA_d2df
ptB_d2df
ptA_d3df
ptB_d3df

dflist = [ptA_d1df, ptB_d1df,ptA_d2df,ptB_d2df,ptA_d3df,ptB_d3df]

给定日期的数据帧具有不同的时间戳,如下所示:

Dataframes on a given day have different timestamps as given below

sundf = 
                          light
2019-01-10 07:35:00     500
2019-01-10 07:36:00     510
2019-01-10 07:37:00     520
2019-01-10 07:38:00     530
2019-01-10 07:39:00     540
2019-01-10 08:14:00     550
2019-01-10 08:15:00     560
2019-01-10 08:16:00     570
2019-01-10 08:17:00     580
2019-01-10 08:18:00     590
2019-01-21 07:34:00     600
2019-01-21 07:35:00     610
2019-01-21 07:36:00     620
2019-01-21 07:38:00     630
2019-01-21 07:39:00     640
2019-01-21 07:40:00     650
2019-01-21 08:14:00     660
2019-01-21 08:15:00     670
2019-01-21 08:16:00     680
2019-01-21 08:18:00     690
2019-01-21 08:19:00     700
2019-01-21 08:20:00     710

ptA_d1df =
                       A
2019-01-10 07:35:22   10
2019-01-10 08:15:17   20

ptB_d1df = 
                       B
2019-01-10 07:38:45   30
2019-01-10 08:18:57   40

ptA_d2df = 
                       A
2019-01-21 07:35:42   50
2019-01-21 08:15:17   60

ptB_d2df = 
                       B
2019-01-21 07:39:04   70
2019-01-21 08:19:22   80

我想将以上内容合并为一个数据帧,以便将每个会话明智的数据分组在一起。这意味着,应该将7.35 AM左右的两个工厂的数据分组为一个时间戳,而不是两个,并且,具有相同名称的列也应当分组为同一列,而不是创建新列。

I wanted to combine above into one dataframe such that each session wise data is grouped together. It means, data of both plants around 7.35 AM should be grouped into one timestamp, not two and, also, columns with the same name should be grouped into the same columns, not creating new columns.

我的代码如下:

dflist = [ptA_d1df, ptB_d1df,ptA_d2df,ptB_d2df]
l=[] 
k1 = []
sundf.index = pd.to_datetime(sundf.index)
sundf['time']=sundf.index  
for i,x in enumerate(dflist): 
    k1=sundf.reindex(dflist[i].index,method='nearest')         `
    l.append(k1.join(dflist[i]).set_index('time').reindex(k1.index,method='nearest'))
combdf = pd.concat(l,1)

我得到的输出带有不同的时间戳和不同的列,如下所示:

I got output with different timestamps and different columns as given below:

  datetime           light  A   B  light  A   B  light  A   B  light   A      B  
2019-01-10 07:35:22   500   10  NaN   ........................ ..
2019-01-10 07:38:45   NaN ......... 530  .  30 Nan.................... 
2019-01-10 08:15:17   560   20  Nan........................ 
2019-01-10 08:18:57   NaN ......... 590  . 40  Nan........................ 
2019-01-21 07:35:42   NaN .                      610   50 
2019-01-21 07:39:04                                             640      70
2019-01-21 08:15:17   
2019-01-21 08:19:22 

我想要一个类似下面的输出:

I wanted an output something like given below:

combdf = 
  datetime            light  A   light   B  
2019-01-10 07:35:22   500   10   530    30
2019-01-10 08:15:17   560   20   590    40  
2019-01-21 07:35:42   610   50   640    70
2019-01-21 08:15:17   670   60   700    80


推荐答案

一种解决方案是将时间划分为5分钟的时间范围:

One solution is to divide time into 5 minutes time frames:

import pandas as pd

sun = [['2019-01-10 07:35:00', 500], ['2019-01-10 08:15:00', 560],
       ['2019-01-21 07:35:00', 610], ['2019-01-21 08:15:00', 670]]
A = [['2019-01-10 07:36:22', 10], ['2019-01-10 08:15:17', 20],
     ['2019-01-21 07:35:42', 50], ['2019-01-21 08:15:17', 60]]

B = [['2019-01-10 07:35:27', 30], ['2019-01-10 08:15:17', 40],
     ['2019-01-21 07:35:22', 70], ['2019-01-21 08:17:37', 80]]
dfsun = pd.DataFrame(sun, columns=['date', 'light'])
dfa = pd.DataFrame(A, columns=['date', 'A'])
dfb = pd.DataFrame(B, columns=['date', 'B'])
dfa['date'] = pd.to_datetime(dfa['date']).dt.floor('5T')
dfb['date'] = pd.to_datetime(dfb['date']).dt.floor('5T')
dfsun['date'] = pd.to_datetime(dfsun['date'])

df = pd.merge(dfa, dfb, on='date', how="outer")
df = pd.merge(dfsun, df, on='date', how="right")
print(df)

Out []:

                 date  light   A   B
0 2019-01-10 07:35:00    500  10  30
1 2019-01-10 08:15:00    560  20  40
2 2019-01-21 07:35:00    610  50  70
3 2019-01-21 08:15:00    670  60  80

如果您已经在csv文件中格式为'file_ {day} _ {sensor} .csv',例如: file_1_2.csv

In the case you have already data in csv files in format 'file_{day}_{sensor}.csv' for example:file_1_2.csv.

df = pd.DataFrame([],columns=['date'])
for day in range(1,15): # csvs :'file_{day}_{sensor}.csv'e.g:'file_1_2.csv'
       dfSensor=pd.DataFrame([],columns=['date'])
       for sensor in range(1,27):
              dfNew = pd.read_csv(f'file_{day}_{sensor}.csv', names=['date', 
sensor])
              dfNew['date'] = pd.to_datetime(dfNew['date']).dt.floor('5T')
              dfSensor = pd.concat([dfSensor,dfNew])
       df = pd.merge(df, dfSensor, on='date', how="outer")
df = pd.merge(dfsun, df, on='date', how="right")

这篇关于python将具有不同时间戳和相同列名称的不同日期数据帧连接在一起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆