需要变通方法来处理数据帧中的时间戳并获取日期时间 [英] need work-around for handling timestamps in dataframe and get datetime
问题描述
我最初发布了一个
这是我的df,首次尝试将格式转换为日期时间后,返回时间戳:
这是将'ts'设置为索引后我的df的样子:
然后我尝试将时间戳转换为日期时间(当它在索引中时),我得到keyError:
我想我在弄清楚您的要求时遇到了麻烦。给定以下形式的df:
ts值
0 2019-10-18 08:13:26.702 14
1 2019-10-18 08:13:26.765 10
2 2019-10-18 08:13:26.790 5
3 2019-10-18 08:13:26.889 6
4 2019- 10-18 08:13:26.901 8
5 2019-10-18 08:13:27.083 33
我可以执行以下命令将ts列转换为pd.datetime变量,并使ts列成为索引:
df ['ts'] = pd.to_datetime(df ['ts'])
df = df.set_index(['ts'],drop = True)
产生df格式
值
ts
2019-10-18 08: 13:26.702 14
2019-10-18 08:13:26.765 10
2019-10-18 08:13:26.790 5
2019-10-18 08:13:26.889 6
2019-10-18 08:13:26.901 8
然后可以打印索引的值,或者为此,请对我想要的索引使用任何迭代。以下只是给出前5个值。
对于range(5)中的i:
print(df.iloc [i]。名称)
2019-10-18 08:13:26.702000
2019-10-18 08:13:26.765000
2019-10-18 08:13:26.790000
2019-10-18 08:13:26.889000
2019-10-18 08:13:26.901000
I originally posted a question about plotting different datetime-sampling in the same plot, stored in many different dataframes.
I got help understanding I needed to convert my time-column (‘ts’) to datetime. I struggled with this, still getting messed up plots. Turns out my conversion to datetime isn’t working, and this is a known thing, as stated here.
A dataframe can’t store datetime in a column (why??), it converts it back to pandas._libs.tslibs.timestamps.Timestamp.
I need to figure out the best work around this to be able to plot large datasets.
In the post above, it is stated that dataframe index can store datetime format, but when I set my column as index, and try to loop through, I get key error.
In[]: df.index.name
Out[]: ‘ts’
but when I try:
for column in df.columns[1:]:
df['ts'] = pd.to_datetime(df['ts'])
I get KeyError: 'ts'
Am I doing something wrong here? Does anyone know if datetime is stored correctly in the index?
However, I would still like to ask about the best work-around for this issue.
My bottom line is wanting to plot several dataframes correctly in the same plot. I have a lot of large datasets, and when trying out things, I am using two simplified dataframes, see below:
print(df1)
ts value
0 2019-10-18 08:13:26.702 14
1 2019-10-18 08:13:26.765 10
2 2019-10-18 08:13:26.790 5
3 2019-10-18 08:13:26.889 6
4 2019-10-18 08:13:26.901 8
5 2019-10-18 08:13:27.083 33
6 2019-10-18 08:13:27.098 21
7 2019-10-18 08:13:27.101 11
8 2019-10-18 08:13:27.129 22
9 2019-10-18 08:13:27.159 29
10 2019-10-18 08:13:27.188 7
11 2019-10-18 08:13:27.212 20
12 2019-10-18 08:13:27.228 24
13 2019-10-18 08:13:27.246 30
14 2019-10-18 08:13:27.395 34
15 2019-10-18 08:23:26.375 40
16 2019-10-18 08:23:26.527 49
17 2019-10-18 08:23:26.725 48
print(df2)
ts value
0 2019-10-18 08:23:26.375 27
1 2019-10-18 08:23:26.427 17
2 2019-10-18 08:23:26.437 4
3 2019-10-18 08:23:26.444 2
4 2019-10-18 08:23:26.527 39
5 2019-10-18 08:23:26.575 25
6 2019-10-18 08:23:26.662 6
7 2019-10-18 08:23:26.676 14
8 2019-10-18 08:23:26.718 11
9 2019-10-18 08:23:26.725 13
What is the best way to achieve the result I am looking for?
I have tried converting ‘ts’ column to both array and list, but nothing seem to bring me closer to a final working result for plotting the datasets together. Converting to datetime in array gives me numpy.datetime64, converting to datetime in list gives me pandas._libs.tslibs.timestamps.Timestamp.
Any help is highly appreciated as this is really driving me crazy.
If needed, my original 'ts' values read from avro files are of type:
'2019-10-18T08:13:27.098000'
Running:
df['ts'] = pd.to_datetime(df['ts'])
returns
'2019-10-18 08:13:27.098' (pandas._libs.tslibs.timestamps.Timestamp)
EDIT 1
Further information about my steps, this is my df after reading the avro files:
This is my df after first attempt to turn the format into datetime, returns timestamp:
This is what my df looks like after setting 'ts' as index:
I then try to turn the timestamp to datetime when it's in the index, I get keyError:
I guess I am having trouble figuring out what you are asking. Given a df of the form:
ts value
0 2019-10-18 08:13:26.702 14
1 2019-10-18 08:13:26.765 10
2 2019-10-18 08:13:26.790 5
3 2019-10-18 08:13:26.889 6
4 2019-10-18 08:13:26.901 8
5 2019-10-18 08:13:27.083 33
I can execute the following to convert the ts column to a pd.datetime varaible and make the ts column the index:
df['ts'] = pd.to_datetime(df['ts'])
df = df.set_index(['ts'], drop=True)
which yields the df of form
value
ts
2019-10-18 08:13:26.702 14
2019-10-18 08:13:26.765 10
2019-10-18 08:13:26.790 5
2019-10-18 08:13:26.889 6
2019-10-18 08:13:26.901 8
I can then print the values of the index, or for that matter use any iteration on the index I want. The following just gives the first 5 values.
for i in range(5):
print(df.iloc[i].name)
2019-10-18 08:13:26.702000
2019-10-18 08:13:26.765000
2019-10-18 08:13:26.790000
2019-10-18 08:13:26.889000
2019-10-18 08:13:26.901000
这篇关于需要变通方法来处理数据帧中的时间戳并获取日期时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!