在 pandas 数据框中将datetime64列拆分为日期和时间列 [英] Split datetime64 column into a date and time column in pandas dataframe

查看:101
本文介绍了在 pandas 数据框中将datetime64列拆分为日期和时间列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我的数据框的第一列是datetime64列.如何将此列拆分为2个新列,即日期列和时间列.到目前为止,这是我的数据和代码:

If I have a dataframe with the first column being a datetime64 column. How do I split this column into 2 new columns, a date column and a time column. Here is my data and code so far:

DateTime,Actual,Consensus,Previous
20140110 13:30:00,74000,196000,241000
20131206 13:30:00,241000,180000,200000
20131108 13:30:00,200000,125000,163000
20131022 12:30:00,163000,180000,193000
20130906 12:30:00,193000,180000,104000
20130802 12:30:00,104000,184000,188000
20130705 12:30:00,188000,165000,176000
20130607 12:30:00,176000,170000,165000
20130503 12:30:00,165000,145000,138000
20130405 12:30:00,138000,200000,268000
...


import pandas as pd
nfp = pd.read_csv("NFP.csv", parse_dates=[0])
nfp

赠予:

Out[10]: <class 'pandas.core.frame.DataFrame'>
         Int64Index: 83 entries, 0 to 82
         Data columns (total 4 columns):
         DateTime     82  non-null values
         Actual       82  non-null values
         Consensus    82  non-null values
         Previous     82  non-null values
         dtypes: datetime64[ns](1), float64(3)

一切都很好,但不确定从这里做什么.

All good but not sure what to do from here.

我不确定两点:

  1. 当我首先读取csv文件时,是否可以这样做?如果可以,怎么办?
  2. 执行csv_read后,任何人都可以帮助我如何进行拆分吗?

还有什么地方可以查找此类信息?

Also is there anywhere I can look up this kind of information?

很难找到类库的详细参考资料,谢谢!

Having a hard time finding a detailed reference of the class libraries Thanks!

推荐答案

如何将CSV直接解析为所需的DataFrame:

将函数的字典传递给pandas.read_csvconverters关键字参数:

Pass a dict of functions to pandas.read_csv's converters keyword argument:

import pandas as pd
import datetime as DT
nfp = pd.read_csv("NFP.csv", 
                  sep=r'[\s,]',              # 1
                  header=None, skiprows=1,
                  converters={               # 2
                      0: lambda x: DT.datetime.strptime(x, '%Y%m%d'),  
                      1: lambda x: DT.time(*map(int, x.split(':')))},
                  names=['Date', 'Time', 'Actual', 'Consensus', 'Previous'])

print(nfp)

收益

        Date      Time  Actual  Consensus  Previous
0 2014-01-10  13:30:00   74000     196000    241000
1 2013-12-06  13:30:00  241000     180000    200000
2 2013-11-08  13:30:00  200000     125000    163000
3 2013-10-22  12:30:00  163000     180000    193000
4 2013-09-06  12:30:00  193000     180000    104000
5 2013-08-02  12:30:00  104000     184000    188000
6 2013-07-05  12:30:00  188000     165000    176000
7 2013-06-07  12:30:00  176000     170000    165000
8 2013-05-03  12:30:00  165000     145000    138000
9 2013-04-05  12:30:00  138000     200000    268000

  1. sep=r'[\s,]'告诉read_csv在屏幕上拆分csv的行 正则表达式模式r'[\s,]'-空格或逗号.
  2. converters参数告诉read_csv应用给定 某些列的功能.键(例如0和1)是指 列索引,值是要应用的函数.
  1. sep=r'[\s,]' tells read_csv to split lines of the csv on the regex pattern r'[\s,]' -- a whitespace or a comma.
  2. The converters parameter tells read_csv to apply the given functions to certain columns. The keys (e.g. 0 and 1) refer to the column index, and the values are the functions to be applied.


执行csv_read后如何拆分DataFrame

import pandas as pd
nfp = pd.read_csv("NFP.csv", parse_dates=[0], infer_datetime_format=True)
temp = pd.DatetimeIndex(nfp['DateTime'])
nfp['Date'] = temp.date
nfp['Time'] = temp.time
del nfp['DateTime']

print(nfp)


哪个更快?

这取决于CSV的大小. (感谢杰夫指出这一点.)

It depends on the size of the CSV. (Thanks to Jeff for pointing this out.)

对于小型CSV,直接将CSV解析为所需格式比使用parse_dates=[0]解析后使用DatetimeIndex更快:

For tiny CSVs, parsing the CSV into the desired form directly is faster than using a DatetimeIndex after parsing with parse_dates=[0]:

def using_converter():
    nfp = pd.read_csv("NFP.csv", sep=r'[\s,]', header=None, skiprows=1,
                      converters={
                          0: lambda x: DT.datetime.strptime(x, '%Y%m%d'),
                          1: lambda x: DT.time(*map(int, x.split(':')))},
                      names=['Date', 'Time', 'Actual', 'Consensus', 'Previous'])
    return nfp

def using_index():
    nfp = pd.read_csv("NFP.csv", parse_dates=[0], infer_datetime_format=True)
    temp = pd.DatetimeIndex(nfp['DateTime'])
    nfp['Date'] = temp.date
    nfp['Time'] = temp.time
    del nfp['DateTime']
    return nfp

In [114]: %timeit using_index()
100 loops, best of 3: 1.71 ms per loop

In [115]: %timeit using_converter()
1000 loops, best of 3: 914 µs per loop

但是,对于只有几百行或更多行的CSV,使用DatetimeIndex更快.

However, for CSVs of just a few hundred lines or more, using a DatetimeIndex is faster.

N = 20
filename = '/tmp/data'
content = '''\
DateTime,Actual,Consensus,Previous
20140110 13:30:00,74000,196000,241000
20131206 13:30:00,241000,180000,200000
20131108 13:30:00,200000,125000,163000
20131022 12:30:00,163000,180000,193000
20130906 12:30:00,193000,180000,104000
20130802 12:30:00,104000,184000,188000
20130705 12:30:00,188000,165000,176000
20130607 12:30:00,176000,170000,165000
20130503 12:30:00,165000,145000,138000
20130405 12:30:00,138000,200000,268000'''

def setup(n):
    header, remainder = content.split('\n', 1)
    with open(filename, 'w') as f:
        f.write('\n'.join([header]+[remainder]*n))

In [304]: setup(50)

In [305]: %timeit using_converter()
100 loops, best of 3: 9.78 ms per loop

In [306]: %timeit using_index()
100 loops, best of 3: 9.3 ms per loop


我在哪里可以找到这类信息?

  1. 有时您可以在 Pandas Cookbook 中找到示例.
  2. li>
  3. 有时通过网络搜索或搜索Stackoverflow就足够了.
  4. 度过一个漫长的周末,除了阅读 熊猫文档肯定也会有所帮助.
  5. 安装 IPython .它具有制表符补全,并且如果您在 函数,它为您提供函数的文档字符串.这两个功能 确实可以帮助您快速内省Python对象.它还会告诉您该函数在哪个文件中定义(如果在纯Python中定义)-导致我...
  6. 阅读源代码
  1. Sometimes you can find examples in the Pandas Cookbook.
  2. Sometimes web searching or searching Stackoverflow suffices.
  3. Spending a weekend snowed in with nothing to do but reading the pandas documentation will surely help too.
  4. Install IPython. It has tab completion and if you type a ? after a function, it gives you the function's docstring. Those two features really help you introspect Python objects quickly. It also tells you in what file the function is defined (if defined in pure Python) -- which leads me to...
  5. Reading the source code

只要坚持下去.了解得越多,它越容易获得.

Just keep at it. The more you know the easier it gets.

如果您尽力而为,但仍然找不到答案,请在Stackoverflow上发布问题.希望您会很快得到答案,并帮助其他人搜索相同的内容.

If you give it your best shot and still can't find the answer, post a question on Stackoverflow. You'll hopefully get an answer quickly, and help others searching for the same thing.

这篇关于在 pandas 数据框中将datetime64列拆分为日期和时间列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆