pandas :从3列创建时间戳:月,日,小时 [英] Pandas: create timestamp from 3 columns: Month, Day, Hour

查看:143
本文介绍了 pandas :从3列创建时间戳:月,日,小时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python 2.7,panda 0.14.1-2,numpy 1.8.1-1.我必须使用Python 2.7,因为我将其与在Python 3上不起作用的东西结合在一起

I'm using Python 2.7, panda 0.14.1-2, numpy 1.8.1-1. I have to use Python 2.7 because I'm coupling it with something that doesn't work on Python 3

我正在尝试分析一个在单独的列中输出Month,Day和Hour的csv文件,其外观类似于以下内容:

I'm trying to analyze a csv files that outputs Month, Day and Hour in separate columns, and would look something like the following:

Month Day Hour Value 1 1 1 105 1 1 2 30 1 1 3 85 1 1 4 52 1 1 5 65

Month Day Hour Value 1 1 1 105 1 1 2 30 1 1 3 85 1 1 4 52 1 1 5 65

我基本上想从这些列中创建一个时间戳,并使用"2005"作为年份,并将这个新的时间戳列设置为索引. 我已经阅读了很多类似的问题(此处),但它们都依赖于read_csv()期间的操作.我没有年份专栏,所以我认为这不适用于我(除了加载数据框,插入专栏,编写和重做read_csv ...似乎有些费解).

I basically want to create a timestamp from those columns, and use "2005" as the year, and set this new timestamp column to be the index. I've read a lot of similar questions (here and here) but they all rely on doing during read_csv(). I don't have a year column, so I don't think this applies to me (aside from loading dataframe, inserting column, writing, and redoing read_csv... seems convoluted).

加载数据框后,我在位置0插入Year列 df.insert(0,"Year",2005)

After loading the dataframe, I insert a Year column in position 0 df.insert(0, "Year", 2005)

所以我现在有

Year Month Day Hour Value 2005 1 1 1 105 2005 1 1 2 30 2005 1 1 3 85 2005 1 1 4 52 2005 1 1 5 65 df.types告诉我所有列都是int64类型.

Year Month Day Hour Value 2005 1 1 1 105 2005 1 1 2 30 2005 1 1 3 85 2005 1 1 4 52 2005 1 1 5 65 df.types tells me that all columns are int64 types.

然后我尝试这样做:

df['Datetime'] = pd.to_datetime(df.Year*1000000 + df.Month*10000 + df.Day+100 + df.Hour, format="%Y%M%d%H")

但是我收到"TypeError:'long'对象无法切片"

But I'm getting "TypeError: 'long' object is unsliceable"

另一方面,以下命令运行无错误.

On the other hand, the following runs without errors.

df['Datetime'] = pd.to_datetime(df.Year*10000 + df.Month*100 + df.Day, format="%Y%M%d")

由于@EdChum指出2.7不喜欢%Y%M%d%H,因此我尝试分两个步骤进行操作:创建不带小时的日期时间,然后添加小时数.但是:输出不是我想要的

As 2.7 doesn't like the %Y%M%d%H, as pointed by @EdChum, I've tried doing it in two steps: creating a datetime without hours, and adding the hours after. But: the output is not what I wanted

In [1]: # Do it without hours first (otherwise doesn't work in Python 2.7)
df['Datetime'] = pd.to_datetime(df.Year*10000 + df.Month*100 + df.Day, format="%Y%M%d")

In [2]: df['Datetime']
Out [2]:
0    2005-01-01 00:01:00
1    2005-01-01 00:01:00
...
13   2005-01-01 00:01:00
14   2005-01-01 00:01:00
...
8745   2005-01-31 00:12:00
8746   2005-01-31 00:12:00
...
8758   2005-01-31 00:12:00
8759   2005-01-31 00:12:00

例如,

8758应该是2005年12月31日. 这有什么问题?

一旦我解决了这个问题,便可以重新添加小时数:

Once I resolve that, I'll be able to re-add the hours:

In [3]: # Then add the hours
df['Datetime'] = df['Datetime'] + pd.to_timedelta(df['Hour'], unit="h")

推荐答案

您可以使用

You could parse the input text in your question using pandas.read_csv():

#!/usr/bin/env python
from datetime import datetime
import pandas as pd

print(pd.read_csv(
    'input.txt', sep=r'\s+', parse_dates=[[0, 1, 2]],
    date_parser=lambda *columns: datetime(2005, *map(int, columns)),
    index_col=0))

输出

                     Value
Month_Day_Hour            
2005-01-01 01:00:00    105
2005-01-01 02:00:00     30
2005-01-01 03:00:00     85
2005-01-01 04:00:00     52
2005-01-01 05:00:00     65

这篇关于 pandas :从3列创建时间戳:月,日,小时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆