CSV数据的时间序列（时间戳和事件） [英] Timeseries from CSV data (Timestamp and events)

查看：595 发布时间：2020/10/17 0:21:24 python pandas matplotlib dataframe time-series

本文介绍了CSV数据的时间序列（时间戳和事件）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用python的pandas模块通过时间序列表示来可视化CSV数据，如下所示（请参见下面的链接）。

I would like to visualize CSV data as shown below, by a timeseries representation, using python's pandas module (see links below).

df1的示例数据：

             TIMESTAMP  eventid
0  2017-03-20 02:38:24        1
1  2017-03-21 05:59:41        1
2  2017-03-23 12:59:58        1
3  2017-03-24 01:00:07        1
4  2017-03-27 03:00:13        1

eventid列始终包含值1，我正在尝试以显示数据集中每一天的事件总和。是

The 'eventid' column always contains the value of 1, and I am trying to show the sum of events for each day in the dataset. Is

pandas.Series.cumsum()

到目前为止正确使用的功能吗？

the correct function to use for this purpose?

迄今为止的脚本：

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df1 = pd.read_csv('timestamp01.csv')
print df1.columns # u'TIMESTAMP', u'eventid'

# I: ts = pd.Series(df1['eventid'], index=df1['TIMESTAMP']) 
# O: Blank plot

# I: ts = pd.Series(df1['eventid'], index=pd.date_range(df1['TIMESTAMP'], periods=1000)) 
# O: TypeError: Cannot convert input ... Name: TIMESTAMP, dtype: object] of type <class 'pandas.core.series.Series'> to Timestamp

# working test example:
# I: ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
# O: See first link below (first plot).

ts = ts.cumsum()
ts.plot()
plt.show()

我尝试遵循的链接：

http://pandas.pydata.org/pandas-docs/stable/visualization.html

从传感器汇总时间序列

（上面的示例具有不同的值，而不是我的 eventid数据）

(above example has different values, as opposed to my 'eventid' data)

d3：数据中的时间序列

任何帮助都是值得赞赏的。

Any help is much appreciated.

推荐答案

似乎您需要将 TIMESTAMP 列转换为 datetime 通过parse_dates noreferrer> read_csv ：

It seems you need convert TIMESTAMP column to datetime by parameter parse_dates in read_csv:

import pandas as pd
from pandas.compat import StringIO

temp=u"""TIMESTAMP,eventid
2017-03-20 02:38:24,1
2017-03-20 05:38:24,1
2017-03-21 05:59:41,1
2017-03-23 12:59:58,1
2017-03-24 01:00:07,1
2017-03-27 03:00:13,1"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),  parse_dates=True, index_col='TIMESTAMP')
print (df)
                     eventid
TIMESTAMP                   
2017-03-20 02:38:24        1
2017-03-20 05:38:24        1
2017-03-21 05:59:41        1
2017-03-23 12:59:58        1
2017-03-24 01:00:07        1
2017-03-27 03:00:13        1

print (df.index)
DatetimeIndex(['2017-03-20 02:38:24', '2017-03-20 05:38:24',
               '2017-03-21 05:59:41', '2017-03-23 12:59:58',
               '2017-03-24 01:00:07', '2017-03-27 03:00:13'],
              dtype='datetime64[ns]', name='TIMESTAMP', freq=None)

然后使用 重新采样 按天进行计数，并按 size 函数。最后 Series.plot ：

Then use resample by days and get counts by size function. Last Series.plot:

print (df.resample('D').size())
TIMESTAMP
2017-03-20    2
2017-03-21    1
2017-03-22    0
2017-03-23    1
2017-03-24    1
2017-03-25    0
2017-03-26    0
2017-03-27    1
Freq: D, dtype: int64

df.resample('D').size().plot()

如果需要更改 tickers 的格式：

import matplotlib.ticker as ticker

ax = df.resample('D').size().plot()
ax.xaxis.set_major_formatter(ticker.FixedFormatter(df.index.strftime('%Y-%m-%d')))

这篇关于CSV数据的时间序列（时间戳和事件）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

CSV数据的时间序列（时间戳和事件） [英] Timeseries from CSV data (Timestamp and events)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

CSV数据的时间序列（时间戳和事件） [英] Timeseries from CSV data (Timestamp and events)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭