创建通用列并转换时间序列,如数据 [英] Create common columns and transform time series like data

查看:57
本文介绍了创建通用列并转换时间序列,如数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Excel工作表,其中包含30多个工作表,用于不同的参数,例如BP,心率等.

I have an excel sheet which contains more than 30 sheets for different parameters like BP, Heart rate etc.

其中一个数据框(df1-由一张excel创建)如下图所示

One of the dataframe (df1 - created from one sheet of excel) looks like as shown below

df1= pd.DataFrame({'person_id':[1,1,1,1,2,2,2,2,3,3,3,3,3,3],'level_1': ['H1Date','H1','H2Date','H2','H1Date','H1','H2Date','H2','H1Date','H1','H2Date','H2','H3Date','H3'],
               'values': ['2006-10-30 00:00:00','6.6','2006-08-30 00:00:00','4.6','2005-10-30 00:00:00','6.9','2016-11-30 00:00:00','6.6','2006-10-30 00:00:00','6.6','2006-11-30 00:00:00','8.6',
                       '2106-10-30 00:00:00','16.6']})

可以使用下面的代码生成另一张excel文件中的另一个数据框(df2)

Another dataframe (df2) from another sheet of excel file can be generated using the code below

df2= pd.DataFrame({'person_id':[1,1,1,1,2,2,2,2,3,3,3,3,3,3],'level_1': ['GluF1Date','GluF1','GluF2Date','GluF2','GluF1Date','GluF1','GluF2Date','GluF2','GluF1Date','GluF1','GluF2Date','GluF2','GluF3Date','GluF3'],
               'values': ['2006-10-30 00:00:00','6.6','2006-08-30 00:00:00','4.6','2005-10-30 00:00:00','6.9','2016-11-30 00:00:00','6.6','2006-10-30 00:00:00','6.6','2006-11-30 00:00:00','8.6',
                       '2106-10-30 00:00:00','16.6']})

类似地,有30多个这样的数据帧,它们的值具有相同的格式(日期和测量值),但列名(H1,GluF1,H1Date,H100,H100Date,GluF1Date,P1,PDate,UACRDate,UACR100等) )是不同的

Similarly there are more than 30 dataframes like this with values of the same format (Date & measurement value) but column names (H1, GluF1, H1Date,H100,H100Date, GluF1Date,P1,PDate,UACRDate,UACR100, etc) are different

基于SO搜索,我试图做的事情如下所示

What I am trying to do based on SO search is as shown below

g = df1.level_1.str[-2:] # Extracting column names
    df1['lvl'] = df1.level_1.apply(lambda x: int(''.join(filter(str.isdigit, x)))) # Extracting level's number
    df1= df1.pivot_table(index=['person_id', 'lvl'], columns=g, values='values', aggfunc='first')
    final = df1.reset_index(level=1).drop(['lvl'], axis=1)

上面的代码给出了这样的输出,这是不期望的

The above code gives an output like this which is not expected

这不起作用,因为g不会对所有记录产生相同的字符串输出(列名).如果子字符串提取得到相同的输出,我的代码将起作用,但是由于数据就像序列,所以我无法使其统一

This doesn't work as g doesn't result in same string output (column names) for all records. My code would work if the substring extract has resulted in same output but since the data is like sequence, I am not able to make it uniform

我希望每个数据帧的输出如下所示.请注意,一个人可以有3条记录(H1..H3)/10条记录(H1..H10)/100条记录(例如:H1 ... H100).都有可能.

I expect my output to be like as shown below for each dataframe. Please note that a person can have 3 records (H1..H3)/10 records (H1..H10) / 100 records (ex: H1...H100). It is all possible.

更新的屏幕截图

推荐答案

在不使用列名的情况下合并所有偶数行和所有奇数行,然后根据需要命名列:

Concat all even and all odd rows without using column names, then name the columns as needed:

res = pd.concat([df2.iloc[0::2,0:3:2].reset_index(drop=True), df2.iloc[1::2,2].reset_index(drop=True)], axis=1)
res.columns = ['Person_ID', 'Date', 'Value']

输出:

   Person_ID                 Date Value
0          1  2006-10-30 00:00:00   6.6
1          1  2006-08-30 00:00:00   4.6
2          2  2005-10-30 00:00:00   6.9
3          2  2016-11-30 00:00:00   6.6
4          3  2006-10-30 00:00:00   6.6
5          3  2006-11-30 00:00:00   8.6
6          3  2106-10-30 00:00:00  16.6

这篇关于创建通用列并转换时间序列,如数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆