Python:循环遍历目录并使用文件名作为数据框名称保存每个文件 [英] Python: Looping through directory and saving each file using filename as data frame name

查看:73
本文介绍了Python:循环遍历目录并使用文件名作为数据框名称保存每个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R 中有一个名为assign 的函数,它为环境中的名称分配一个值.

In R there is a function called assign which assigns a value to a name in the environment.

EG:

assign("Hello", 2)
> Hello
[1] 2

在 python 中,我似乎不能做同样的事情.我最初尝试过:

In python I can't seem to do the same. I initially tried:

import numpy as np
import pandas as pd
import os

for file in os.listdir('C:\Users\Olivia\Documents'):
    if file.endswith(".csv"):
        os.path.splitext(file)[0] = pd.read_csv('C:\Users\Olivia\Documents\' + file)

但我可以看到这是试图使一个字符串等于一个不起作用的文件.

But I can see this is trying to make a string equal to a file which doesn't work.

我设法通过执行以下操作获得了列表中的所有文件:

I managed to get all the files in a list by doing:

import glob

dl = glob.glob(r'C:UsersOliviaDocuments*.csv')
nl = []
for i in dl:
    pl = i.split(os.sep)
    name = pl[5][:-4]
    nl.append(name)

ddict = {}

 for k, v in zip(nl,dl):
    ddict[k] = ddict.get(k,"") + v

 dfl = []

 for k, v in ddict.items():
    dfl.append(read_csv(v))

但是现在我如何从列表中获取每个数据框并命名为没有扩展名的文件.必须有办法将列表中的每个数据框指定为文件列表中的名称

But now how do I get each data frame out of the list and named as the file without the extension. There must be a way to assign each data frame in the list as a name from the file list

推荐答案

老实说,您的第一种方法是正确的.不幸的是,python 没有为您提供动态创建可变数量的变量"的选项,正如您已经尝试并意识到的那样.但是!您可以创建一个字典并根据需要将数据帧分配给字符串键.方法如下.

Honestly, you were on the right track with your first method. Unfortunately, python doesn't give you the option to create a "variable number of variables" dynamically, as you have tried and realised already. However! You can create a dictionary and assign dataframes to string keys as you like. Here's how.

root = 'C:\Users\Olivia\Documents'

ddict = {}
for file in os.listdir(root):
    if file.endswith(".csv"):
        name = os.path.splitext(file)[0]
        ddict[name] = pd.read_csv(os.path.join(root, file))

构建这本词典的另一种方法是使用dict comprehension:

Another way of building this dictionary is using a dict comprehension:

ddict = {os.path.splitext(file)[0] : pd.read_csv(os.path.join(root, file)) 
                for file in os.listdir(root) if file.endswith('csv')
}

现在,引用单个数据帧就像

Now, referring to a single dataframe is as easy as

ddict['your_file_name']

另一件要注意的事情,最安全的文件连接方式是使用os.path.join.它比普通的 + 更安全.

Another thing to note, the safest way to join files is using os.path.join. It's just safer than a plain +.

参考资料

这篇关于Python:循环遍历目录并使用文件名作为数据框名称保存每个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆