CSV数据的时间序列(时间戳和事件) [英] Timeseries from CSV data (Timestamp and events)
问题描述
我想使用python的pandas模块通过时间序列表示来可视化CSV数据,如下所示(请参见下面的链接)。
I would like to visualize CSV data as shown below, by a timeseries representation, using python's pandas module (see links below).
df1的示例数据:
TIMESTAMP eventid
0 2017-03-20 02:38:24 1
1 2017-03-21 05:59:41 1
2 2017-03-23 12:59:58 1
3 2017-03-24 01:00:07 1
4 2017-03-27 03:00:13 1
eventid列始终包含值1,我正在尝试以显示数据集中每一天的事件总和。是
The 'eventid' column always contains the value of 1, and I am trying to show the sum of events for each day in the dataset. Is
pandas.Series.cumsum()
到目前为止正确使用的功能吗?
the correct function to use for this purpose?
迄今为止的脚本:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df1 = pd.read_csv('timestamp01.csv')
print df1.columns # u'TIMESTAMP', u'eventid'
# I: ts = pd.Series(df1['eventid'], index=df1['TIMESTAMP'])
# O: Blank plot
# I: ts = pd.Series(df1['eventid'], index=pd.date_range(df1['TIMESTAMP'], periods=1000))
# O: TypeError: Cannot convert input ... Name: TIMESTAMP, dtype: object] of type <class 'pandas.core.series.Series'> to Timestamp
# working test example:
# I: ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
# O: See first link below (first plot).
ts = ts.cumsum()
ts.plot()
plt.show()
我尝试遵循的链接:
http://pandas.pydata.org/pandas-docs/stable/visualization.html
(上面的示例具有不同的值,而不是我的 eventid数据)
(above example has different values, as opposed to my 'eventid' data)
任何帮助都是值得赞赏的。
Any help is much appreciated.
推荐答案
似乎您需要将 TIMESTAMP
列转换为 datetime
通过parse_dates noreferrer> read_csv
:
It seems you need convert TIMESTAMP
column to datetime
by parameter parse_dates
in read_csv
:
import pandas as pd
from pandas.compat import StringIO
temp=u"""TIMESTAMP,eventid
2017-03-20 02:38:24,1
2017-03-20 05:38:24,1
2017-03-21 05:59:41,1
2017-03-23 12:59:58,1
2017-03-24 01:00:07,1
2017-03-27 03:00:13,1"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), parse_dates=True, index_col='TIMESTAMP')
print (df)
eventid
TIMESTAMP
2017-03-20 02:38:24 1
2017-03-20 05:38:24 1
2017-03-21 05:59:41 1
2017-03-23 12:59:58 1
2017-03-24 01:00:07 1
2017-03-27 03:00:13 1
print (df.index)
DatetimeIndex(['2017-03-20 02:38:24', '2017-03-20 05:38:24',
'2017-03-21 05:59:41', '2017-03-23 12:59:58',
'2017-03-24 01:00:07', '2017-03-27 03:00:13'],
dtype='datetime64[ns]', name='TIMESTAMP', freq=None)
然后使用 重新采样
按天
进行计数,并按 size
函数。最后 Series.plot
:
Then use resample
by days
and get counts by size
function. Last Series.plot
:
print (df.resample('D').size())
TIMESTAMP
2017-03-20 2
2017-03-21 1
2017-03-22 0
2017-03-23 1
2017-03-24 1
2017-03-25 0
2017-03-26 0
2017-03-27 1
Freq: D, dtype: int64
df.resample('D').size().plot()
如果需要更改 tickers
的格式:
import matplotlib.ticker as ticker
ax = df.resample('D').size().plot()
ax.xaxis.set_major_formatter(ticker.FixedFormatter(df.index.strftime('%Y-%m-%d')))
这篇关于CSV数据的时间序列(时间戳和事件)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!