圆 pandas 日期时间指数? [英] Round pandas datetime index?

查看:129
本文介绍了圆 pandas 日期时间指数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将多个时间表的电子表格读入大熊猫数据帧,并将它们与普通的熊猫日期时间索引连接起来。记录时间序列的数据记录器不是100%准确的,这使得重采样非常烦人,因为如果时间稍微高于或低于采样间隔,它将创建NaN并开始使我的系列看起来像一条虚线。这是我的代码

  def loaddata(filepaths):
t1 = time.clock()
for i在范围(LEN(文件路径)):
XL = pd.ExcelFile(文件路径[I])
DF = xl.parse(xl.sheet_names [0],标题= 0,index_col = 2,skiprows = [0,2,3,4],parse_dates = True)
df = df.dropna(axis = 1,how ='all')
df = df.drop(['Decimal Year Day ','Decimal Year Day.1','RECORD'],axis = 1)

如果i == 0:
dfs = df
else:
dfs = concat([dfs,df],axis = 1)
t2 = time.clock()
print以%s秒加载到数据帧中的文件%(t2-t1)

files = [London Lysimeters correct 5min.xlsx,London Water Balance 5min.xlsx]
data = loaddata(files)

这是一个索引的想法:


data.index



class'pandas.tseries.index.DatetimeIndex'>
[2012-08-27 12:05:00.0000 02,...,2013-07-12 15:10:00.000004]
长度:91910,频率:无,时区:无


将索引舍入到最接近的分钟是最快最普遍的方法?

解决方案

这里有一点招。数据时间为纳秒(当视为 np.int64 )时。
所以轮以纳秒分钟



 在[75]:指数= pd.DatetimeIndex([时间戳( 20120827 12:05:00.002 '),时间标记(' 20130101 12时05分01秒 '),时间标记(' 20130712 15时10分00秒 '),时间标记(' 20130712 15:10:00.000004' )])

在[79]:index.values
输出[79]:
数组(['2012-08-27T08:05:00.002000000-0400',
'2013-01 -01T07:05:01.000000000-0500' ,
'2013-07-12T11:10:00.000000000-0400',
'2013-07-12T11:10:00.000004000-0400'],D型= datetime64 [NS]')

在[78]:pd.DatetimeIndex(((index.asi8 /(1E9 * 60))轮()* 1E9 * 60).astype(np.int64 ))值
Out [78]:
array(['2012-08-27T08:05:00.000000000-0400',
'2013-01-01T07:05:00.000000000-0500 ',
'2013-07-12T11:10:00.000000000-0400',
'2013-07-12T11:10:00.000000000-0400'],dtype ='datetime64 [ns]')


I am reading multiple spreadsheets of timeseries into a pandas dataFrame and concatenating them together with a common pandas datetime index. The datalogger that logged the timeseries is not 100% accurate which makes resampling very annoying because depending on if the time is slightly higher or lower than the interval being sampled it will create NaNs and starts to make my series look like a broken line. Here's my code

def loaddata(filepaths):
    t1 = time.clock()
    for i in range(len(filepaths)):
        xl = pd.ExcelFile(filepaths[i])
        df = xl.parse(xl.sheet_names[0], header=0, index_col=2, skiprows=[0,2,3,4], parse_dates=True)
        df = df.dropna(axis=1, how='all') 
        df = df.drop(['Decimal Year Day', 'Decimal Year Day.1', 'RECORD'], axis=1)

        if i == 0:
            dfs = df
        else:
            dfs = concat([dfs, df], axis=1)
    t2 = time.clock()
    print "Files loaded into dataframe in %s seconds" %(t2-t1)

files = ["London Lysimeters corrected 5min.xlsx", "London Water Balance 5min.xlsx"]
data = loaddata(files)

Here's an idea of the index:

data.index

class 'pandas.tseries.index.DatetimeIndex'> [2012-08-27 12:05:00.000002, ..., 2013-07-12 15:10:00.000004] Length: 91910, Freq: None, Timezone: None

What would be the fastest and most general to round the index to the nearest minute?

解决方案

Here's a little trick. Datetimes are in nanoseconds (when viewed as np.int64). So round to minutes in nanoseconds.

In [75]: index = pd.DatetimeIndex([ Timestamp('20120827 12:05:00.002'), Timestamp('20130101 12:05:01'), Timestamp('20130712 15:10:00'), Timestamp('20130712 15:10:00.000004') ])

In [79]: index.values
Out[79]: 
array(['2012-08-27T08:05:00.002000000-0400',
       '2013-01-01T07:05:01.000000000-0500',
       '2013-07-12T11:10:00.000000000-0400',
       '2013-07-12T11:10:00.000004000-0400'], dtype='datetime64[ns]')

In [78]: pd.DatetimeIndex(((index.asi8/(1e9*60)).round()*1e9*60).astype(np.int64)).values
Out[78]: 
array(['2012-08-27T08:05:00.000000000-0400',
       '2013-01-01T07:05:00.000000000-0500',
       '2013-07-12T11:10:00.000000000-0400',
       '2013-07-12T11:10:00.000000000-0400'], dtype='datetime64[ns]')

这篇关于圆 pandas 日期时间指数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆