pandas 将csv dateint列读取到datetime [英] Pandas read csv dateint columns to datetime

查看：176 发布时间：2017/4/15 16:48:34 python datetime pandas

本文介绍了 pandas 将csv dateint列读取到datetime的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是新来的StackOverflow和大熊猫。我试图用一个大型CSV文件读取以下格式的股票市场数据：

 日期，时间，开放，高，低，接近，数量，分割，收益，股息，sym 
 20130625,715,49.2634,49.2634,49.2634,49.2634,156.293,1,0,0，JPM 
 20130625,730,49.273， 49.273,49.273,49.273,208.39,1,0,0，JPM 
 20130625,740,49.1866,49.1866,49.1866,49.1866,224.019,1,0,0，JPM 
 20130625,745,49.321， 49.321,49.321,49.321,208.39,1,0,0，JPM 
 20130625,750,49.3306,49.369,49.3306,49.369,4583.54,1,0,0，JPM 
 20130625,755,49.369， 49.369,49.369,49.369,416.78,1,0,0，JPM 
 20130625,800,49.369,49.369,49.3594,49.3594,1715.05,1,0,0，JPM 
 20130625,805,49.369， 49.369,49.3306,49.3306,1333.7,1,0,0，JPM 
 20130625,810,49.3306,49.3786,49.3306,49.3786,1567.09,1,0,0，JPM

我有以下代码将其读入Pandas中的DataFrame

  import numpy as np 
 import scipy as sp 
 import pandas as pd 
 import datetime as dt 
 fname ='bindat.csv'
 df = pd.read_csv（fname，header = 0，sep ='，'）

问题是日期和时间列以int64的形式读取。我想将这两个合并到一个单一的时间戳，例如：2013-06-25 07:15:00。

我正在努力寻找时间正确使用：

  df ['date'] = pd.to_datetime（df ['date']。astype（str） 
 df ['time'] = pd.to_datetime（df ['time']。astype（str））

第一个命令可以转换，但时间似乎很奇怪。

  df.info（）
< class'pandas.core.frame.DataFrame'> 
 Int64Index：9999条目，0到9998 
数据列（共11列）：
日期9999非空datetime64 [ns] 
时间9999非空对象
打开9999非空float64 
高9999非空float64 
低9999非空float64 
关闭9999非空float64 
卷9999非空float64 
拆分9999非空float64 
收入9999非空int64 
红利9999非空float64 
 sym 9999非空对象
 dtypes：datetime64 [ns]（1） ，float64（7），int64（1），object（2）None

然后我会想要合并成一个DatetimeIndex。

非常感谢任何建议。

干杯！

解决方案

有很多方法可以做到这一点。在 read_csv 中执行此操作的一种方法是使用 parse_dates 和 date_parser 参数，告诉 parse_dates 组合日期和时间列并定义一个内联函数来解析日期：

 >>> df = pd.read_csv（bindat.csv，parse_dates = [[date，time]]，
 date_parser = lambda x：pd.to_datetime（x，format =％Y％m％d ％H％M），
 index_col =date_time）
>>> df 
打开高低收盘价分割盈利股息sym 
 date_time 
 2013-06-25 07:15:00 49.2634 49.2634 49.2634 49.2634 156.293 1 0 0 JPM 
 2013-06- 25 07:30:00 49.2730 49.2730 49.2730 49.2730 208.390 1 0 0 JPM 
 2013-06-25 07:40:00 49.1866 49.1866 49.1866 49.1866 224.019 1 0 0 JPM 
 2013-06-25 07:45 ：00 49.3210 49.3210 49.3210 49.3210 208.390 1 0 0 JPM 
 2013-06-25 07:50:00 49.3306 49.3690 49.3306 49.3690 4583.540 1 0 0 JPM 
 2013-06-25 07:55:00 49.3690 49.3690 49.3690 49.3690 416.780 1 0 0 JPM 
 2013-06-25 08:00:00 49.3690 49.3690 49.3594 49.3594 1715.050 1 0 0 JPM 
 2013-06-25 08:05:00 49.3690 49.3690 49.3306 49.3306 1333.700 1 0 0 JPM 
 2013-06-25 08:10:00 49.3306 49.3786 49.3306 49.3786 1567.090 1 0 0 JPM 
 2013-06-25 16:10:00 49.3306 49.3786 49.3306 49.3786 1567.090 1 0 0 JPM

我在最后添加了一个额外的行，以确保小时行为。 p>

I'm new to both StackOverflow and pandas. I am trying to read in a large CSV file with stock market bin data in the following format:

date,time,open,high,low,close,volume,splits,earnings,dividends,sym
20130625,715,49.2634,49.2634,49.2634,49.2634,156.293,1,0,0,JPM
20130625,730,49.273,49.273,49.273,49.273,208.39,1,0,0,JPM
20130625,740,49.1866,49.1866,49.1866,49.1866,224.019,1,0,0,JPM
20130625,745,49.321,49.321,49.321,49.321,208.39,1,0,0,JPM
20130625,750,49.3306,49.369,49.3306,49.369,4583.54,1,0,0,JPM
20130625,755,49.369,49.369,49.369,49.369,416.78,1,0,0,JPM
20130625,800,49.369,49.369,49.3594,49.3594,1715.05,1,0,0,JPM
20130625,805,49.369,49.369,49.3306,49.3306,1333.7,1,0,0,JPM
20130625,810,49.3306,49.3786,49.3306,49.3786,1567.09,1,0,0,JPM

I have the following code to read it into a DataFrame in Pandas

import numpy as np
import scipy as sp
import pandas as pd
import datetime as dt
fname  = 'bindat.csv'
df     = pd.read_csv(fname, header=0, sep=',')

The problem is that the date and time columns are read in as int64. I would like to merge these two to a single timestamp such as: 2013-06-25 07:15:00.

I am struggling to even get the time read in properly using:

df['date'] = pd.to_datetime(df['date'].astype(str))
df['time'] = pd.to_datetime(df['time'].astype(str))

The first command works to convert, but the time seems weird.

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 9999 entries, 0 to 9998
Data columns (total 11 columns):
date         9999 non-null datetime64[ns]
time         9999 non-null object
open         9999 non-null float64
high         9999 non-null float64
low          9999 non-null float64
close        9999 non-null float64
volume       9999 non-null float64
splits       9999 non-null float64
earnings     9999 non-null int64
dividends    9999 non-null float64
sym          9999 non-null object
dtypes: datetime64[ns](1), float64(7), int64(1), object(2)None

And then I'll want to merge into a single DatetimeIndex.

Any suggestions are greatly appreciated.

Cheers!

解决方案

There are quite a few ways to do this. One way to do it during read_csv would be to use the parse_dates and date_parser arguments, telling parse_dates to combine the date and time columns and defining an inline function to parse the dates:

>>> df = pd.read_csv("bindat.csv", parse_dates=[["date", "time"]],
date_parser=lambda x: pd.to_datetime(x, format="%Y%m%d %H%M"), 
index_col="date_time")
>>> df
                        open     high      low    close    volume  splits  earnings  dividends  sym
date_time                                                                                          
2013-06-25 07:15:00  49.2634  49.2634  49.2634  49.2634   156.293       1         0          0  JPM
2013-06-25 07:30:00  49.2730  49.2730  49.2730  49.2730   208.390       1         0          0  JPM
2013-06-25 07:40:00  49.1866  49.1866  49.1866  49.1866   224.019       1         0          0  JPM
2013-06-25 07:45:00  49.3210  49.3210  49.3210  49.3210   208.390       1         0          0  JPM
2013-06-25 07:50:00  49.3306  49.3690  49.3306  49.3690  4583.540       1         0          0  JPM
2013-06-25 07:55:00  49.3690  49.3690  49.3690  49.3690   416.780       1         0          0  JPM
2013-06-25 08:00:00  49.3690  49.3690  49.3594  49.3594  1715.050       1         0          0  JPM
2013-06-25 08:05:00  49.3690  49.3690  49.3306  49.3306  1333.700       1         0          0  JPM
2013-06-25 08:10:00  49.3306  49.3786  49.3306  49.3786  1567.090       1         0          0  JPM
2013-06-25 16:10:00  49.3306  49.3786  49.3306  49.3786  1567.090       1         0          0  JPM

where I added an extra row at the end to make sure that hours were behaving.

这篇关于 pandas 将csv dateint列读取到datetime的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 将csv dateint列读取到datetime [英] Pandas read csv dateint columns to datetime

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 将csv dateint列读取到datetime [英] Pandas read csv dateint columns to datetime

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭