用一列列表重新采样数据框 [英] Resample a dataframe with a column of lists

查看:80
本文介绍了用一列列表重新采样数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试对熊猫中的数据框进行重新采样. 我在输入中收到这样的.csv文件(数据"列中的列表以字符串形式):`

Trying to resample a dataframe in pandas. I receive in input a .csv like this (the list in Data column are in form of strings): `

Name,Timestamp,Data
A1,5.26,"[1.0,1.2,1.9]"
A1,5.28,"[1.8,2.1,3.9]"
A1,5.30,"[1.2,1.4,0.9]"
A1,5.32,"[...]"
...
A2,5.26,"[...]"
A2,5.28,"[...]"
A2,5.30,"[...]"
A2,5.32,"[...]"
...
A3,5.26,"[...]"
A3,5.28,"[...]"
A3,5.30,"[...]"
A3,5.32,"[...]"`

数据以50hz(因此每20ms)记录一次.我想重新采样25hz(所以每40ms).

Datas are recorded at 50hz (so every 20ms). I want to resample 25hz (so every 40ms).

我使用以下方法将数据"列从字符串转换为实际列表

I converted the Data column from string to an actual list with

df['Data'] = df['Data'].apply(ast.literal_eval)

和时间戳记转换为秒,用:

and the Timestamp into seconds with:

df['Timestamp'] = pd.to_datetime(df['Timestamp'], unit='s')

我知道我必须使用.resample()函数,所以我尝试了

I know that I've to use the .resample() function so I tried with

df.groupby('Name').resample("40L", on='Timestamp')

并没有给我错误,但似乎根本没有重新采样,实际上我具有相同数据的相同行数,只是将Timestamp列转换为Datetime(如果我添加了重采样功能结束后,它给我错误No numeric types to aggregate).

and it doesn't give me errors but it seems it doesn't resample at all in fact I've the same number of rows with same datas and just the Timestamp column converted into Datetime (and if I add a .mean() after the end of resample function it gives me the error No numeric types to aggregate).

重新采样后,我希望我的桌子如下:

I want the my table after the resample looks like:

Name Timestamp  Data
A1    5.26     [...]
A1    5.30     [...]
...
A2    5.26     [...]
A2    5.30     [...]
...
A3    5.26     [...]
A3    5.30     [...]

我该怎么办?

推荐答案

您的问题是将数据部分转换为实际的数字数据. ast.literal_eval不会剪切它,因为您无法在list上执行算术运算.这就是我要做的:

Your problem is to convert the data part into actual numeric data. ast.literal_eval doesn't cut it because you cannot perform arithmetic operations on list. Here's what I would do:

df = pd.read_csv('your.csv')
df['Timestamp'] = pd.to_datetime(df['Timestamp'], unit='s')

df = df.join(df['Data'].str[1:-1]
                       .str.split(',', expand=True)
                       .astype(float)
            )

# resample
df.groupby('Name').resample('40L', on='Timestamp').mean()

之后,您的df类似于:

                                0     1    2
Name Timestamp                              
A1   1970-01-01 00:00:05.240  1.0  1.20  1.9
     1970-01-01 00:00:05.280  1.5  1.75  2.4
     1970-01-01 00:00:05.320  1.4  1.65  2.9
     1970-01-01 00:00:05.360  1.5  1.75  2.4
     1970-01-01 00:00:05.400  1.2  1.40  0.9
A2   1970-01-01 00:00:05.240  1.0  1.20  1.9
     1970-01-01 00:00:05.280  1.5  1.75  2.4
     1970-01-01 00:00:05.320  1.4  1.65  2.9
     1970-01-01 00:00:05.360  1.5  1.75  2.4
     1970-01-01 00:00:05.400  1.2  1.40  0.9

这篇关于用一列列表重新采样数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆