需要变通方法来处理数据帧中的时间戳并获取日期时间 [英] need work-around for handling timestamps in dataframe and get datetime

查看:253
本文介绍了需要变通方法来处理数据帧中的时间戳并获取日期时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最初发布了一个


这是我的df,首次尝试将格式转换为日期时间后,返回时间戳:



这是将'ts'设置为索引后我的df的样子:



然后我尝试将时间戳转换为日期时间(当它在索引中时),我得到keyError:


解决方案

我想我在弄清楚您的要求时遇到了麻烦。给定以下形式的df:

  ts值
0 2019-10-18 08:13:26.702 14
1 2019-10-18 08:13:26.765 10
2 2019-10-18 08:13:26.790 5
3 2019-10-18 08:13:26.889 6
4 2019- 10-18 08:13:26.901 8
5 2019-10-18 08:13:27.083 33

我可以执行以下命令将ts列转换为pd.datetime变量,并使ts列成为索引:

  df ['ts'] = pd.to_datetime(df ['ts'])
df = df.set_index(['ts'],drop = True)

产生df格式

 
ts
2019-10-18 08: 13:26.702 14
2019-10-18 08:13:26.765 10
2019-10-18 08:13:26.790 5
2019-10-18 08:13:26.889 6
2019-10-18 08:13:26.901 8

然后可以打印索引的值,或者为此,请对我想要的索引使用任何迭代。以下只是给出前5个值。

 对于range(5)中的i:
print(df.iloc [i]。名称)

2019-10-18 08:13:26.702000
2019-10-18 08:13:26.765000
2019-10-18 08:13:26.790000
2019-10-18 08:13:26.889000
2019-10-18 08:13:26.901000


I originally posted a question about plotting different datetime-sampling in the same plot, stored in many different dataframes.

I got help understanding I needed to convert my time-column (‘ts’) to datetime. I struggled with this, still getting messed up plots. Turns out my conversion to datetime isn’t working, and this is a known thing, as stated here.

A dataframe can’t store datetime in a column (why??), it converts it back to pandas._libs.tslibs.timestamps.Timestamp.

I need to figure out the best work around this to be able to plot large datasets.

In the post above, it is stated that dataframe index can store datetime format, but when I set my column as index, and try to loop through, I get key error.

 In[]: df.index.name 
 Out[]: ‘ts’

but when I try:

for column in df.columns[1:]:
    df['ts'] = pd.to_datetime(df['ts'])

I get KeyError: 'ts'

Am I doing something wrong here? Does anyone know if datetime is stored correctly in the index?

However, I would still like to ask about the best work-around for this issue.

My bottom line is wanting to plot several dataframes correctly in the same plot. I have a lot of large datasets, and when trying out things, I am using two simplified dataframes, see below:

print(df1)
                        ts  value
0  2019-10-18 08:13:26.702     14
1  2019-10-18 08:13:26.765     10
2  2019-10-18 08:13:26.790      5
3  2019-10-18 08:13:26.889      6
4  2019-10-18 08:13:26.901      8
5  2019-10-18 08:13:27.083     33
6  2019-10-18 08:13:27.098     21
7  2019-10-18 08:13:27.101     11
8  2019-10-18 08:13:27.129     22
9  2019-10-18 08:13:27.159     29
10 2019-10-18 08:13:27.188      7
11 2019-10-18 08:13:27.212     20
12 2019-10-18 08:13:27.228     24
13 2019-10-18 08:13:27.246     30
14 2019-10-18 08:13:27.395     34
15 2019-10-18 08:23:26.375     40
16 2019-10-18 08:23:26.527     49
17 2019-10-18 08:23:26.725     48

print(df2)
                       ts  value
0 2019-10-18 08:23:26.375     27
1 2019-10-18 08:23:26.427     17
2 2019-10-18 08:23:26.437      4
3 2019-10-18 08:23:26.444      2
4 2019-10-18 08:23:26.527     39
5 2019-10-18 08:23:26.575     25
6 2019-10-18 08:23:26.662      6
7 2019-10-18 08:23:26.676     14
8 2019-10-18 08:23:26.718     11
9 2019-10-18 08:23:26.725     13

What is the best way to achieve the result I am looking for?

I have tried converting ‘ts’ column to both array and list, but nothing seem to bring me closer to a final working result for plotting the datasets together. Converting to datetime in array gives me numpy.datetime64, converting to datetime in list gives me pandas._libs.tslibs.timestamps.Timestamp.

Any help is highly appreciated as this is really driving me crazy.

If needed, my original 'ts' values read from avro files are of type:

 '2019-10-18T08:13:27.098000'

Running:

df['ts'] = pd.to_datetime(df['ts'])

returns

'2019-10-18 08:13:27.098'  (pandas._libs.tslibs.timestamps.Timestamp)

EDIT 1

Further information about my steps, this is my df after reading the avro files:

This is my df after first attempt to turn the format into datetime, returns timestamp:

This is what my df looks like after setting 'ts' as index:

I then try to turn the timestamp to datetime when it's in the index, I get keyError:

解决方案

I guess I am having trouble figuring out what you are asking. Given a df of the form:

    ts  value
0   2019-10-18 08:13:26.702 14
1   2019-10-18 08:13:26.765 10
2   2019-10-18 08:13:26.790 5
3   2019-10-18 08:13:26.889 6
4   2019-10-18 08:13:26.901 8
5   2019-10-18 08:13:27.083 33

I can execute the following to convert the ts column to a pd.datetime varaible and make the ts column the index:

df['ts'] = pd.to_datetime(df['ts'])
df = df.set_index(['ts'], drop=True)

which yields the df of form

                       value
       ts   
2019-10-18 08:13:26.702 14
2019-10-18 08:13:26.765 10
2019-10-18 08:13:26.790 5
2019-10-18 08:13:26.889 6
2019-10-18 08:13:26.901 8

I can then print the values of the index, or for that matter use any iteration on the index I want. The following just gives the first 5 values.

for i in range(5):
    print(df.iloc[i].name)

2019-10-18 08:13:26.702000
2019-10-18 08:13:26.765000
2019-10-18 08:13:26.790000
2019-10-18 08:13:26.889000
2019-10-18 08:13:26.901000

这篇关于需要变通方法来处理数据帧中的时间戳并获取日期时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆