pandas ：快速添加可变数量的月份到时间戳列 [英] Pandas: Quickly add variable number of months to a timestamp column

查看：287 发布时间：2017/4/14 6:21:40 python datetime numpy pandas

本文介绍了 pandas ：快速添加可变数量的月份到时间戳列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是设置：

我有两个（整数索引）列，开始和 month_delta 。 开始有时间戳（其内部类型为 np.datetime64 [ns] ）和 month_delta 是整数。

我想快速生成由开始，抵消 month_delta 中的相应月份数。我如何做到这一点？

我尝试的事情不起作用：

申请太慢了。

你不能添加一系列DateOffset 对象到一系列 datetime64 [ns] dtype（或一个 DatetimeIndex ）。 >
您不能使用一系列 timedelta64 对象;大熊猫默默地将基于月的timedeltas转换为长达30天的纳秒级timedeltas。（Yikes！发生什么事情不会默默地失败？）

 
 
 目前我正在迭代所有不同的 month_delta ，并在 DatetimeIndex 的相关部分中执行 tshift  ，但这是一个可怕的 kludge：
  new_dates = pd.Series（pd.Timestamp.now （），index = start.index）
 date_index = pd.DatetimeIndex（start）
在xrange（month_delta.max（））中的$：
 mask =（month_delta == i）
 cur_dates = pd.Series（index = date_index [mask]）。tshift（i，freq ='M'）。index 
 new_dates [mask] = cur_dates 
  Y！！！！！！！！！！！！！！！！！！！任何建议？
解决方案
这是一种方法（通过添加NumPy datetime64s与timedelta64s）而不调用应用：
 导入熊猫为pd 
导入numpy为np 
 np.random.seed（1）
 
 def combine64（years，months = 1，days = 1，weeks = None，hours = None，minutes = None，
 seconds = None，milliseconds =无，微秒=无，纳秒=无）：
年= np.asarray（年） -  1970 
个月= np.asarray（月） -  1 
 days = np.asarray天） -  1 
 types =（'< M8 [Y]'，'< m8 [M]'，'< m8 [D]'，'< m8 [W] m8 [h]'，
'< m8 [m]'，'< m8 [s]'，'< m8 [ms]'，'< m8 [us]'，'& [ns]'）
 vals =（年，月，日，周，小时，分钟，秒，
毫秒，微秒，纳秒）
返回总和（np.asarray（v，dtype = t）for t，v in zip（types，vals）
 if 
 
 def年（日期）：
返回一个给定数组datetime64s的数组
 return dates.astype（'M8 [Y] '）.astype（'i8'）+ 1970 
 
 def月（日期）：
返回一个给定数组datetime64的数组的数组
 return dates.astype （'M8 [M]'）。astype（'i8'）％12 + 1 
 
 def day（dates）：
返回一个数组，的datetime64s
 return（dates  -  dates.astype（'M8 [M]'）））/ np.timedelta64（1，'D'）+ 1 
 
 N = 10 
 df = pd.DataFrame（{
'start'：pd.date_range（'2000-1-25'，periods = N，freq ='D'），
'months'：np.random 。$（$）$）
 df ['new_date'] = combine64（年（开始），月份=月（开始）+ df ['months']，
 days = day（start））
 
 print（df）
  
产生
 个月开始new_date 
 0 5 2000-01-25 2000-06-25 
 1 11 2000-01-26 2000-12-26 
 2 8 2000-01-27 2000-09-27 
 3 9 2000-01-28 2000-10-28 
 4 11 2000-01-29 2000-12-29 
 5 5 2000-01-30 2000-06-30 
 6 0 2000-01-31 2000-01-31 
 7 0 2000-02-01 2000-02-01 
 8 1 2000-02-02 2000-03-02 
 9 7 2000-02-03 2000-09-03 
  
 
Here's the setup:

I have two (integer-indexed) columns, start and month_delta. start has timestamps (its internal type is np.datetime64[ns]) and month_delta is integers.

I want to quickly produce the column that consists of the each datetime in start, offset by the corresponding number of months in month_delta. How do I do this?

Things I've tried that don't work:


apply is too slow.
You can't add a series of DateOffset objects to a series of datetime64[ns] dtype (or a DatetimeIndex).
You can't use a Series of timedelta64 objects either; Pandas silently converts month-based timedeltas to nanosecond-based timedeltas that are ~30 days long. (Yikes! What happened to not failing silently?)


Currently I'm iterating over all different values of month_delta and doing a tshift by that amount on the relevant part of a DatetimeIndex I created, but this is a horrible kludge:
new_dates = pd.Series(pd.Timestamp.now(), index=start.index)
date_index = pd.DatetimeIndex(start)
for i in xrange(month_delta.max()):
    mask = (month_delta == i)
    cur_dates = pd.Series(index=date_index[mask]).tshift(i, freq='M').index
    new_dates[mask] = cur_dates
Yuck! Any suggestions?
 解决方案 
Here is a way to do it (by adding NumPy datetime64s with timedelta64s) without calling apply:
import pandas as pd
import numpy as np
np.random.seed(1)

def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None,
              seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
    years = np.asarray(years) - 1970
    months = np.asarray(months) - 1
    days = np.asarray(days) - 1
    types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
             '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
    vals = (years, months, days, weeks, hours, minutes, seconds,
            milliseconds, microseconds, nanoseconds)
    return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
               if v is not None)

def year(dates):
    "Return an array of the years given an array of datetime64s"
    return dates.astype('M8[Y]').astype('i8') + 1970

def month(dates):
    "Return an array of the months given an array of datetime64s"
    return dates.astype('M8[M]').astype('i8') % 12 + 1

def day(dates):
    "Return an array of the days of the month given an array of datetime64s"
    return (dates - dates.astype('M8[M]')) / np.timedelta64(1, 'D') + 1

N = 10
df = pd.DataFrame({
   'start': pd.date_range('2000-1-25', periods=N, freq='D'),
   'months': np.random.randint(12, size=N)})
start = df['start'].values
df['new_date'] = combine64(year(start), months=month(start) + df['months'], 
                           days=day(start))

print(df)
yields
   months      start   new_date
0       5 2000-01-25 2000-06-25
1      11 2000-01-26 2000-12-26
2       8 2000-01-27 2000-09-27
3       9 2000-01-28 2000-10-28
4      11 2000-01-29 2000-12-29
5       5 2000-01-30 2000-06-30
6       0 2000-01-31 2000-01-31
7       0 2000-02-01 2000-02-01
8       1 2000-02-02 2000-03-02
9       7 2000-02-03 2000-09-03


                        
这篇关于 pandas ：快速添加可变数量的月份到时间戳列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

pandas ：快速添加可变数量的月份到时间戳列 [英] Pandas: Quickly add variable number of months to a timestamp column

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas ：快速添加可变数量的月份到时间戳列 [英] Pandas: Quickly add variable number of months to a timestamp column

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭