从dask数据框中的日期时间序列中获取年和周? [英] Getting year and week from a datetime series in a dask dataframe?

查看:91
本文介绍了从dask数据框中的日期时间序列中获取年和周?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有一个Pandas数据框,并且它是日期时间类型的列,则可以按以下方式获得年份:

If I have a Pandas dataframe, and a column that is a datetime type, I can get the year as follows:

df['year'] = df['date'].dt.year

如果数据帧较暗,则无法正常工作.如果我先计算,就像这样:

With a dask dataframe, that does not work. If I compute first, like this:

df['year'] = df['date'].compute().dt.year

我得到ValueError: Not all divisions are known, can't align partitions. Please use set_index or set_partition to set the index.

I get ValueError: Not all divisions are known, can't align partitions. Please useset_indexorset_partitionto set the index.

但如果我这样做:

df['date'].head().dt.year

工作正常!

那我如何在快速数据框中获得日期时间序列的年(或周)?

So how do I get the year (or week) of a datetime series in a dask dataframe?

推荐答案

Dask系列对象上存在.dt datetime名称空间.这是其使用的自包含内容:

The .dt datetime namespace is present on Dask series objects. Here is a self-contained of its use:

In [1]: import pandas as pd

In [2]: df = pd.util.testing.makeTimeSeries().to_frame().reset_index().head(10)

In [3]: df  # some pandas data to turn into a dask.dataframe
Out[3]: 
       index         0
0 2000-01-03 -0.034297
1 2000-01-04 -0.373816
2 2000-01-05 -0.844751
3 2000-01-06  0.924542
4 2000-01-07  0.507070
5 2000-01-10  0.216684
6 2000-01-11  1.191743
7 2000-01-12 -2.103547
8 2000-01-13  0.156629
9 2000-01-14  1.602243

In [4]: import dask.dataframe as dd

In [5]: ddf = dd.from_pandas(df, npartitions=3)

In [6]: ddf['year'] = df['index'].dt.year  # use the .dt namespace

In [7]: ddf.head()
Out[7]: 
       index         0  year
0 2000-01-03 -0.034297  2000
1 2000-01-04 -0.373816  2000
2 2000-01-05 -0.844751  2000
3 2000-01-06  0.924542  2000

这篇关于从dask数据框中的日期时间序列中获取年和周?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆