循环遍历目录中的文件,在pandas中添加日期列 [英] Loop through files in a directory, add a date column in pandas

查看:58
本文介绍了循环遍历目录中的文件,在pandas中添加日期列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我所有的文件都有以下标题,而且它们可以追溯到几年前.我希望能够读取每个文件,然后将文件名中的日期添加为一列.

All of my files have the following titles and they stretch back for a few years. I want to be able to read each file and then add the date from the file name as a column.

截至 2015-04-01.csv 的文件类型

Filetype as of 2015-04-01.csv

path = 'C:\\Users\\'  
filelist = os.listdir(path)     #All of my .csv files I am working with
file_count = len(filelist)      #I thought I could do a for loop and use this as a the range
df = Series(filelist)           #I just added this because I couldn't get the date from a list
date_name = df.str[15:-4]       #This gives me the date 

所以我尝试过的是:

for file in filelist:
    df = pd.read_csv(file)

现在我想从文件名中获取 date_name 并添加一个名为 date 的列.每个文件都完全相同,但我想跟踪随时间的变化,并且仅在文件名上找到唯一的日期.

Now I want to take the date_name from the file name and add a column called date. Every file is exactly the same but I want to track changes over time and the only date is found just on the name of the file.

然后我会附加它.

path = 'C:\\Users\\'
filelist = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list = []
for file in filelist:
    df = pd.read_csv(file)
    list_.append(df)
frame = pd.concat(list)

如何将 date_name 添加到文件/数据框?1)读取文件,2)根据文件名添加日期列,3)读取下一个文件,4)添加日期列,5)追加,6)对路径中的所有文件重复

How can I add the date_name to the file/dataframe? 1) Read the file, 2) Add the date column based on the file name, 3) Read the next file, 4) Add the date column, 5) Append, 6) Repeat for all files in the path

编辑---我想我有工作要做 - 这是最好的方法吗?有人可以解释一下 list = [] 正在做什么,以及正在做什么吗?

Edit--- I think I got something to work - is this the best way? Can someone explain what the list = [] is doing and such is doing?

path = 'C:\\Users\\'
filelist = os.listdir(path) 
list = []
frame = pd.DataFrame()
for file in filelist:
    df2 = pd.read_csv(path+file)
    date_name = file[15:-4]
    df2['Date'] = date_name
    list.append(df2)
frame = pd.concat(list)

推荐答案

这似乎是一个合理的方法.pd.concat 获取一个 pandas 对象列表并将它们连接起来.append 在您循环浏览文件时将每个 frame 添加到列表中.不过,我认为有两件事需要改变.

This seems like a reasonable way to do it. The pd.concat takes a list of pandas objects and concatenates them. append adds each frame to the list as you loop through the files. I see two things to change though.

  1. 您不需要frame = pd.DataFrame().当您将 dataframes 附加到列表时,它没有做任何事情.
  2. 我会将变量 list 的名称更改为其他名称.也许 frames 因为它是对内容的描述,并不意味着什么.
  1. You don't need frame = pd.DataFrame(). It is not doing anything as you are appending dataframes to the list.
  2. I'd change the name of the variable list to something else. Maybe frames as it is descriptive of the contents and doesn't already mean something.

这篇关于循环遍历目录中的文件,在pandas中添加日期列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆