Pandas groupby:按学期分组 [英] Pandas groupby: group by semester

查看:70
本文介绍了Pandas groupby:按学期分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要按学期对数据进行分组,但没有可用的频率标签这里

I need to group data by semesters but there is no frequency tag available here

2QS(从开始的 2 个季度)和 6MS(从开始的 6 个月)不会做,因为它们会根据第一个日期时间在不同的时刻开始我的数据框.(非常违反直觉且容易出错,恕我直言:直到我使用了从 1 月而不是 5 月开始的不同数据集,我才发现这个问题......)

2QS (2 quarters from start) and 6MS (6 months from start) won't do because they will start in different moments, according to the first datetime in my dataframe. (Quite counterintuitive and prone to errors, IMHO: I didn't see this issue till I used a different dataset that began in May instead of January...)

from datetime import *
import pandas as pd
import numpy as np

df = pd.DataFrame()

days = pd.date_range(start="2017-05-17", 
                     end="2017-11-29",
                    freq="1D")
df = pd.DataFrame({'DTIME': days, 'DATA': np.random.randint(50, high=80, size=len(days))})
df.set_index('DTIME', inplace=True)

grouped = df.groupby(pd.Grouper(freq='2QS'))
print("Groups date start:")
for dtime, group in grouped:
    print dtime
    # print(group)

返回

Groups date start:
2017-04-01 00:00:00   <== because my first datetime is in May, 2017
2017-10-01 00:00:00

代替:

Groups date start:
2017-01-01 00:00:00   <== I want the semesters referred to the year!
2017-06-01 00:00:00

作为一种可能的解决方法,我在数据框中创建了两个新列,然后根据它们进行分组:

As a possible workaround I created two new columns in my dataframe and then group according to them:

      df["year"] = df.index.year.astype(int)
      df["semester"] = df.index.month.astype(int)
      df["semester"] = df["semester"] - 1
      df["semester"] = df["semester"] // 6
      grouped = df.groupby(["year", "semester"])

这是唯一的方法吗?

还有另外两个小问题,只是为了好奇,不值得一个独立的stackoverflow问题:

There are two other little questions, just for the sake of curiosity and not worth an indipendent stackoverflow question:

  1. 为什么标签 W(周末)可用,而 WS(周末)不可用?

  1. why a tag W (end of week) is available, but WS (start of week) is not?

如何在一行中写这个?

  df["semester"] = df.index.month.astype(int)
  df["semester"] = df["semester"] - 1
  df["semester"] = df["semester"] // 6

推荐答案

最接近的是 anchored-offsets,但一个月不见了.

The closest are anchored-offsets, but for month it missing.

第二:

df["semester"] =  (df.index.month.astype(int) - 1) // 6

或者不创建新列:

years = df.index.year.astype(int)
semes = (df.index.month.astype(int) - 1) // 6
grouped = df.groupby([years, semes])

这篇关于Pandas groupby:按学期分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆