获取文件创建日期-添加到read_csv上的dataframes列 [英] Get file created date - add to dataframes column on read_csv

查看:172
本文介绍了获取文件创建日期-添加到read_csv上的dataframes列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将许多(数百个)CSV放入一个pandas数据框中.在读取每个CSV文件的pandas数据框时,我需要在列中添加创建文件的日期.我可以使用以下调用获取CSV文件的创建日期:

I need to pull many (hundreds) CSV's into a pandas dataframe. I need to a add the date the file was created in a column upon read in to the pandas dataframe for each CSV file. I can obtain the date of creation for a CSV file using this call:

time.strftime('%m/%d/%Y', time.gmtime(os.path.getmtime('/path/file.csv')))

仅供参考,这是我用来读取CSV的命令:

As an fyi, this is the command I am using to read in the CSVs:

path1 = r'/path/'
all_files_standings = glob.glob(path1 + '/*.csv')
standings = pd.concat((pd.read_csv(f, low_memory=False, usecols=[7, 8, 9]) for f in standings))

我尝试运行此调用(有效):

I tried running this call (which worked):

dt_gm = [time.strftime('%m/%d/%Y', time.gmtime(os.path.getmtime('/path/file.csv')))]

所以我尝试扩展它:

dt_gm = [time.strftime('%m/%d/%Y', time.gmtime(os.path.getmtime(f) for f in all_files_standings))]

我得到这个错误:

TypeError:必须为整数(生成类型的生成器)

TypeError: an integer is required (got type generator)

我该如何解决?

推荐答案

如果不同的文件具有相同的列,并且您希望将不同的文件追加到行中.

if the different files have the same columns and you would like to append different files into rows.

import pandas as pd
import time
import os

# lis of files you want to read
files = ['one.csv', 'two.csv']

column_names = ['c_1', 'c_2', 'c_3']

all_dataframes = []
for file_name in files:
    df_temp = pd.read_csv(file_name, delimiter=',', header=None)
    df_temp.columns = column_names
    df_temp['creation_time'] = time.strftime('%m/%d/%Y', time.gmtime(os.path.getmtime(file_name)))
    df_temp['file_name'] = file_name
    all_dataframes.append(df_temp)

df = pd.concat(all_dataframes, axis=0, ignore_index=True)

df

输出:

如果要按列追加其他文件:

if you want to append the different files by columns:

all_dataframes = []
for idx, file_name in enumerate(files):
    df_temp = pd.read_csv(file_name, delimiter=',', header=None)
    column_prefix = 'f_' + str(idx) + '_'
    df_temp.columns = [column_prefix + c for c in column_names]
    df_temp[column_prefix + 'creation_time'] = time.strftime('%m/%d/%Y', time.gmtime(os.path.getmtime(file_name)))
    all_dataframes.append(df_temp)

pd.concat(all_dataframes, axis=1)

输出:

这篇关于获取文件创建日期-添加到read_csv上的dataframes列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆