从数据集的给定日期范围中提取属于一天的数据 [英] Extracting data belonging to a day from a given range of dates on a dataset

查看:182
本文介绍了从数据集的给定日期范围中提取属于一天的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个日期范围为2018年1月12日至8月3日的数据集,其中包含一些值:

I have a data set with a date range from January 12th to August 3rd of 2018 with some values:

my_df DataFrame的维数为:

The dimensionality of my_df DataFrame is:

my_df.shape 
(9752, 2)

每行包含半小时的班次

第一行始于2018-01-12

my_df.iloc[0]
Date:       2018-01-12 00:17:28
Value                      1
Name: 0, dtype: object

最后一行以2018-08-03

my_df.tail(1)
                  Date:     Value
9751    2018-08-03 23:44:59  1

我的目标是选择每天对应的数据行并将其导出到CSV文件.

My goal is to select the data rows corresponding to each day and export it to a CSV file.

要仅获取1月12日的数据并将其保存到可读文件中,请执行以下操作:

To get only the January 12th data and save to readable file, I perform:

# Selecting data value of each day
my_df_Jan12 = my_df[(my_df['Fecha:']>='2018-01-12 00:00:00') 
              & 
              (my_df['Fecha:']<='2018-01-12 23:59:59')
                                   ]
my_df_Jan12.to_csv('Data_Jan_12.csv', sep=',', header=True, index=False)

从1月12日到8月03日,共有203天(28周)

From January 12 to August 03 there are 203 days (28 weeks)

我不想每天手动执行此查询,因此我正在尝试以下基本分析:

I don't want to perform this query by each day manually, then I am trying the following basic analysis:

  • 我需要生成203个文件(每天1个文件)
  • 从1月12日(1月12日)开始的1月的一天
  • 一月是第一个月(01),八月是第八个月(08)

然后:

  • 我需要遍历整个203天
    • ,并且必须在每个日期行中的值检查 起息日和日期,以检查更改 每个人
    • I need to iterate over the 203 days totality
      • and is necessary in each date row value check the month and day value date with the order to check the change of each one of them

      根据上述情况,我正在尝试这种方法:

      According to the above, I am trying this approach:

      # Selecting data value of each day (203 days)
      for i in range(203):
          for j in range(1,9): # month
              for k in range(12,32): # days of the month
                  values = my_df[(my_df['Fecha:']>='2018-0{}-{} 00:00:00'.format(j,k)) 
                  &  
                  (my_df['Fecha:']<='2018-0{}-{} 23:59:59'.format(j,k))]
                  values.to_csv('Values_day_{}.csv'.format(i), sep=',', header=True, index=False)
      

      但是我有一个问题,就是我在几个月的日子中迭代range(12,32)时,这个range(12,32)仅适用于一月的第一个月,我想是这样的...

      But I have the problem in the sense of when I iterate of range(12,32) in the days of the months, this range(12,32) only apply to first January month, I think so ...

      最后,由于我做错了某事,我得到了203个空CSV文件...

      Finally, I get 203 empty CSV files, due to something I am doing wrong ...

      如何以合适的方式应对这一小挑战? 任何方向都受到高度赞赏

      How to can I address this small challenge of the suited way? Any orientation is highly appreciated

      推荐答案

      是这样的吗?我将您的Date:原始列重命名为Timestamp.我还假设您拥有的Date:系列是熊猫DateTime系列.

      Something like this? I renamed your original column of Date: to Timestamp. I am also assuming that the Date: Series you have is a pandas DateTime series.

      my_df.columns = ['Timestamp', 'Value']
      my_df['Date'] = my_df['Timestamp'].apply(lambda x: x.date())
      dates = my_df['Date'].unique()
      for date in dates:
          f_name = str(date) + '.csv'
          my_df[my_df['Date'] == date].to_csv(f_name)
      

      这篇关于从数据集的给定日期范围中提取属于一天的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆