Python-从一个大CSV文件写入多个每月CSV文件-自动文件名以反映月份 [英] Python - Writing Multiple Monthly CSV Files from One Large CSV File - Automating Filenames to Reflect Month

查看:85
本文介绍了Python-从一个大CSV文件写入多个每月CSV文件-自动文件名以反映月份的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

过去几天,我对这个问题进行了大量研究,但仍然找不到针对我的问题的建议.

I've researched this question heavily for the past few days and I still cannot find suggestions to my problem.

下面是我的数据框名为"dfs"的示例.大约有80列,在下面的示例中仅显示4列.

Below is an example of my dataframe titled 'dfs'. There are around 80 columns, only 4 shown in the below example.

dfs是一个大型数据框,包含超过15个月每15分钟报告的数据行(即2015-08-01 00:00:00至2016-09-30 23:45:00). Datetime列的格式为datetime.

dfs is a large dataframe consisting of rows of data reported every 15 minutes for over 12 months (i.e. 2015-08-01 00:00:00 to 2016-09-30 23:45:00). The Datetime column is in the format datetime.

...

...

我想导出(或写入)多个每月的csv文件,这些文件是从原始大型csv文件(dfs)中获取的每月数据的摘要.对于每个月,我都希望写入一个包含原始数据,白天数据(上午6点至下午6点)和夜间数据(下午6点至上午6点)的文件.我还希望每个月度文件的名称都是自动的,这样它就可以根据其包含的数据来称呼自己为dfs_%Y%m,dfs_day_%Y%m或dfs_night_%Y%m.

I want to export (or write) multiple monthly csv files, which are snippets of monthly data taken from the original large csv file (dfs). For each month, I want a file to be written that contains the the raw data, day data (6am-6pm) and night data (6pm-6am). I also want the name of each monthly file to be automated so it knows whether to call itself dfs_%Y%m, or dfs_day_%Y%m, or dfs_night_%Y%m depending on the data it contains.

目前,我正在写出180行以上的代码来导出每个csv文件.

At the moment I am writing out over 180 lines of code to export each csv file.

例如:

我通过从索引日期时间"列中获取下面列出的日期时间之间的数据来创建每月的原始,白天和夜晚文件

dfs201508 = dfs.ix['2015-08-01 00:00:00':'2015-08-31 23:45:00']

dfs201508Day = dfsDay.ix['2015-08-01 00:00:00':'2015-08-31 23:45:00']

dfs201508Night = dfsNight.ix['2015-08-01 00:00:00':'2015-08-31 23:45:00']

然后,我将这些文件导出到它们各自的输出路径并为其提供文件名

dfs201508 = dfs201508.to_csv(outputpath+"dfs201508.csv")

dfs201508Day = dfs201508Day.to_csv(outputpathDay+"dfs_day_201508.csv")

dfs201508Night = dfs201508Night.to_csv(outputpathNight+"dfs_night_201508.csv")

我要写的是这样的

dfs_%Y%m = dfs.ix["%Y%m"]

dfs_day_%Y%m = dfs.ix["%Y%m(between 6am-6pm)"]

dfs_night_%Y%m = dfs.ix["%Y%m(between 6pm-6am)"]

dfs_%Y%m = dfs_%Y%m.to_csv(outputpath +"dfs_%Y%m.csv")

dfs_day_%Y%m = dfs_day_%Y%m.to_csv(outputpath%day +"dfs_day_%Y%m.csv")

dfs_night_%Y%m = dfs_night_%Y%m.to_csv(outputpath%night +"dfs_night_%Y%m.csv")

任何有关自动执行此过程的代码的建议,将不胜感激.

Any suggestions on the code to automate this process would be greatly appreciated.

以下是我研究过的页面的一些链接:

Here are some links to pages I researched:

https://www.youtube.com/watch?v=aeZKJGEfD7U

将多个Python字典写入csv文件

在Python中以csv格式打开文件名+ date

推荐答案

您可以使用for循环遍历dfs中包含的年月.在下面的示例中,我创建了一个名为DF的虚拟数据框,其中仅包含三个示例列:

You can use a for loop to iterate over the years and months contained within dfs. I created a dummy dataframe called DF in the below example, which contains just three sample columns:

dates               Egen1_kwh   Egen2_kwh
2016-01-01 00:00:00 15895880    15877364
2016-01-01 00:15:00 15895880    15877364
2016-01-01 00:30:00 15895880    15877364
2016-01-01 00:45:00 15895880    15877364
2016-01-01 01:00:00 15895880    15877364

下面的代码将每年每个月中的主数据框DF过滤为较小的数据框(NIGHTDAY),并将其保存为.csv,并具有与日期相对应的名称(例如2016_1_NIGHT分别表示2016年1月的白天和2016年1月的晚上).

The below code filters the main dataframe DF into smaller dataframes (NIGHT and DAY) for each month within each year and saves them to as .csv with a name corresponding to their date (e.g. 2016_1_DAY and 2016_1_NIGHT for Jan 2016 Day and Jan 2016 Night).

import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta
from random import randint

# I defined a sample dataframe with dummy data
start = datetime.datetime(2016,1,1,0,0)
dates = [start + relativedelta(minutes=15*i) for i in range(0,10000)]
Egen1_kwh = randint(15860938,15898938)
Egen2_kwh = randint(15860938,15898938)

DF = pd.DataFrame({
        'dates': dates,
        'Egen1_kwh': Egen1_kwh,
        'Egen2_kwh': Egen2_kwh,
    })


# define when day starts and ends (MUST USE 24 CLOCK)
day = {
        'start': datetime.time(6,0),  # start at 6am (6:00)
        'end': datetime.time(18,0)  # ends at 6pm (18:00)
      }


# capture years that appear in dataframe
min_year = DF.dates.min().year
max_year = DF.dates.max().year

if min_year == max_year:
    yearRange = [min_year]
else:
    yearRange = range(min_year, max_year+1)

# iterate over each year and each month within each year
for year in yearRange:
    for month in range(1,13):

        # filter to show NIGHT and DAY dataframe for given month within given year
        NIGHT = DF[(DF.dates >= datetime.datetime(year, month, 1)) & 
           (DF.dates <= datetime.datetime(year, month, 1) + relativedelta(months=1) - relativedelta(days=1)) & 
           ((DF.dates.apply(lambda x: x.time()) <= day['start']) | (DF.dates.apply(lambda x: x.time()) >= day['end']))]

        DAY = DF[(DF.dates >= datetime.datetime(year, month, 1)) & 
           (DF.dates <= datetime.datetime(year, month, 1) + relativedelta(months=1) - relativedelta(days=1)) & 
           ((DF.dates.apply(lambda x: x.time()) > day['start']) & (DF.dates.apply(lambda x: x.time()) < day['end']))]

        # save to .csv with date and time in file name
        # specify the save path of your choice
        path_night = 'C:\\Users\\nickb\\Desktop\\stackoverflow\\{0}_{1}_NIGHT.csv'.format(year, month)
        path_day = 'C:\\Users\\nickb\\Desktop\\stackoverflow\\{0}_{1}_DAY.csv'.format(year, month)

        # some of the above NIGHT / DAY filtering will return no rows.
        # Check for this, and only save if the dataframe contains rows
        if NIGHT.shape[0] > 0:
            NIGHT.to_csv(path_night, index=False)
        if DAY.shape[0] > 0:
            DAY.to_csv(path_day, index=False)

这篇关于Python-从一个大CSV文件写入多个每月CSV文件-自动文件名以反映月份的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆