pandas :快速添加可变数量的月份到时间戳列 [英] Pandas: Quickly add variable number of months to a timestamp column

查看:287
本文介绍了 pandas :快速添加可变数量的月份到时间戳列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是设置:



我有两个(整数索引)列,开始 month_delta 开始有时间戳(其内部类型为 np.datetime64 [ns] )和 month_delta 是整数。



我想快速生成由开始,抵消 month_delta 中的相应月份数。我如何做到这一点?



我尝试的事情不起作用:




  • 申请太慢了。

  • 你不能添加一系列 DateOffset 对象到一系列 datetime64 [ns] dtype(或一个 DatetimeIndex )。 >
  • 您不能使用一系列 timedelta64 对象;大熊猫默默地将基于月的timedeltas转换为长达30天的纳秒级timedeltas。 (Yikes!发生什么事情不会默默地失败?)



目前我正在迭代所有不同的 month_delta ,并在 DatetimeIndex 的相关部分中执行 tshift ,但这是一个可怕的 kludge:

  new_dates = pd.Series(pd.Timestamp.now (),index = start.index)
date_index = pd.DatetimeIndex(start)
在xrange(month_delta.max())中的$:
mask =(month_delta == i)
cur_dates = pd.Series(index = date_index [mask])。tshift(i,freq ='M')。index
new_dates [mask] = cur_dates
Y!!!!!!!!!!!!!!!!!!!任何建议?

解决方案

这是一种方法(通过添加NumPy datetime64s与timedelta64s)而不调用应用

 导入熊猫为pd 
导入numpy为np
np.random.seed(1)

def combine64(years,months = 1,days = 1,weeks = None,hours = None,minutes = None,
seconds = None,milliseconds =无,微秒=无,纳秒=无):
年= np.asarray(年) - 1970
个月= np.asarray(月) - 1
days = np.asarray天) - 1
types =('< M8 [Y]','< m8 [M]','< m8 [D]','< m8 [W] m8 [h]',
'< m8 [m]','< m8 [s]','< m8 [ms]','< m8 [us]','& [ns]')
vals =(年,月,日,周,小时,分钟,秒,
毫秒,微秒,纳秒)
返回总和(np.asarray(v,dtype = t)for t,v in zip(types,vals)
if

def年(日期):
返回一个给定数组datetime64s的数组
return dates.astype('M8 [Y] ').astype('i8')+ 1970

def月(日期):
返回一个给定数组datetime64的数组的数组
return dates.astype ('M8 [M]')。astype('i8')%12 + 1

def day(dates):
返回一个数组,的datetime64s
return(dates - dates.astype('M8 [M]')))/ np.timedelta64(1,'D')+ 1

N = 10
df = pd.DataFrame({
'start':pd.date_range('2000-1-25',periods = N,freq ='D'),
'months':np.random 。$($)$)
df ['new_date'] = combine64(年(开始),月份=月(开始)+ df ['months'],
days = day(start))

print(df)

产生

 个月开始new_date 
0 5 2000-01-25 2000-06-25
1 11 2000-01-26 2000-12-26
2 8 2000-01-27 2000-09-27
3 9 2000-01-28 2000-10-28
4 11 2000-01-29 2000-12-29
5 5 2000-01-30 2000-06-30
6 0 2000-01-31 2000-01-31
7 0 2000-02-01 2000-02-01
8 1 2000-02-02 2000-03-02
9 7 2000-02-03 2000-09-03


Here's the setup:

I have two (integer-indexed) columns, start and month_delta. start has timestamps (its internal type is np.datetime64[ns]) and month_delta is integers.

I want to quickly produce the column that consists of the each datetime in start, offset by the corresponding number of months in month_delta. How do I do this?

Things I've tried that don't work:

  • apply is too slow.
  • You can't add a series of DateOffset objects to a series of datetime64[ns] dtype (or a DatetimeIndex).
  • You can't use a Series of timedelta64 objects either; Pandas silently converts month-based timedeltas to nanosecond-based timedeltas that are ~30 days long. (Yikes! What happened to not failing silently?)

Currently I'm iterating over all different values of month_delta and doing a tshift by that amount on the relevant part of a DatetimeIndex I created, but this is a horrible kludge:

new_dates = pd.Series(pd.Timestamp.now(), index=start.index)
date_index = pd.DatetimeIndex(start)
for i in xrange(month_delta.max()):
    mask = (month_delta == i)
    cur_dates = pd.Series(index=date_index[mask]).tshift(i, freq='M').index
    new_dates[mask] = cur_dates

Yuck! Any suggestions?

解决方案

Here is a way to do it (by adding NumPy datetime64s with timedelta64s) without calling apply:

import pandas as pd
import numpy as np
np.random.seed(1)

def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None,
              seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
    years = np.asarray(years) - 1970
    months = np.asarray(months) - 1
    days = np.asarray(days) - 1
    types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
             '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
    vals = (years, months, days, weeks, hours, minutes, seconds,
            milliseconds, microseconds, nanoseconds)
    return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
               if v is not None)

def year(dates):
    "Return an array of the years given an array of datetime64s"
    return dates.astype('M8[Y]').astype('i8') + 1970

def month(dates):
    "Return an array of the months given an array of datetime64s"
    return dates.astype('M8[M]').astype('i8') % 12 + 1

def day(dates):
    "Return an array of the days of the month given an array of datetime64s"
    return (dates - dates.astype('M8[M]')) / np.timedelta64(1, 'D') + 1

N = 10
df = pd.DataFrame({
   'start': pd.date_range('2000-1-25', periods=N, freq='D'),
   'months': np.random.randint(12, size=N)})
start = df['start'].values
df['new_date'] = combine64(year(start), months=month(start) + df['months'], 
                           days=day(start))

print(df)

yields

   months      start   new_date
0       5 2000-01-25 2000-06-25
1      11 2000-01-26 2000-12-26
2       8 2000-01-27 2000-09-27
3       9 2000-01-28 2000-10-28
4      11 2000-01-29 2000-12-29
5       5 2000-01-30 2000-06-30
6       0 2000-01-31 2000-01-31
7       0 2000-02-01 2000-02-01
8       1 2000-02-02 2000-03-02
9       7 2000-02-03 2000-09-03

这篇关于 pandas :快速添加可变数量的月份到时间戳列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆