从数据集的给定日期范围中提取属于一天的数据 [英] Extracting data belonging to a day from a given range of dates on a dataset
问题描述
我有一个日期范围为2018年1月12日至8月3日的数据集,其中包含一些值:
I have a data set with a date range from January 12th to August 3rd of 2018 with some values:
my_df
DataFrame的维数为:
The dimensionality of my_df
DataFrame is:
my_df.shape
(9752, 2)
每行包含半小时的班次
第一行始于2018-01-12
my_df.iloc[0]
Date: 2018-01-12 00:17:28
Value 1
Name: 0, dtype: object
最后一行以2018-08-03
my_df.tail(1)
Date: Value
9751 2018-08-03 23:44:59 1
我的目标是选择每天对应的数据行并将其导出到CSV文件.
My goal is to select the data rows corresponding to each day and export it to a CSV file.
要仅获取1月12日的数据并将其保存到可读文件中,请执行以下操作:
To get only the January 12th data and save to readable file, I perform:
# Selecting data value of each day
my_df_Jan12 = my_df[(my_df['Fecha:']>='2018-01-12 00:00:00')
&
(my_df['Fecha:']<='2018-01-12 23:59:59')
]
my_df_Jan12.to_csv('Data_Jan_12.csv', sep=',', header=True, index=False)
从1月12日到8月03日,共有203天(28周)
From January 12 to August 03 there are 203 days (28 weeks)
我不想每天手动执行此查询,因此我正在尝试以下基本分析:
I don't want to perform this query by each day manually, then I am trying the following basic analysis:
- 我需要生成203个文件(每天1个文件)
- 从1月12日(1月12日)开始的1月的一天
- 一月是第一个月(01),八月是第八个月(08)
然后:
- 我需要遍历整个203天
- ,并且必须在每个日期行中的值检查 起息日和日期,以检查更改 每个人
- I need to iterate over the 203 days totality
- and is necessary in each date row value check the month and day value date with the order to check the change of each one of them
根据上述情况,我正在尝试这种方法:
According to the above, I am trying this approach:
# Selecting data value of each day (203 days) for i in range(203): for j in range(1,9): # month for k in range(12,32): # days of the month values = my_df[(my_df['Fecha:']>='2018-0{}-{} 00:00:00'.format(j,k)) & (my_df['Fecha:']<='2018-0{}-{} 23:59:59'.format(j,k))] values.to_csv('Values_day_{}.csv'.format(i), sep=',', header=True, index=False)
但是我有一个问题,就是我在几个月的日子中迭代
range(12,32)
时,这个range(12,32)
仅适用于一月的第一个月,我想是这样的...But I have the problem in the sense of when I iterate of
range(12,32)
in the days of the months, thisrange(12,32)
only apply to first January month, I think so ...最后,由于我做错了某事,我得到了203个空CSV文件...
Finally, I get 203 empty CSV files, due to something I am doing wrong ...
如何以合适的方式应对这一小挑战? 任何方向都受到高度赞赏
How to can I address this small challenge of the suited way? Any orientation is highly appreciated
推荐答案
是这样的吗?我将您的
Date:
原始列重命名为Timestamp
.我还假设您拥有的Date:
系列是熊猫DateTime
系列.Something like this? I renamed your original column of
Date:
toTimestamp
. I am also assuming that theDate:
Series you have is a pandasDateTime
series.my_df.columns = ['Timestamp', 'Value'] my_df['Date'] = my_df['Timestamp'].apply(lambda x: x.date()) dates = my_df['Date'].unique() for date in dates: f_name = str(date) + '.csv' my_df[my_df['Date'] == date].to_csv(f_name)
这篇关于从数据集的给定日期范围中提取属于一天的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!