循环创建多个数据框 [英] Creating multiple dataframes with a loop

查看:78
本文介绍了循环创建多个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这无疑反映出我缺乏知识,但是我找不到任何在线帮助.我对编程非常陌生.我想加载6个csvs并对其进行一些处理,然后再组合它们.以下代码遍历每个文件,但仅创建一个称为df的数据框.

This undoubtedly reflects lack of knowledge on my part, but I can't find anything online to help. I am very new to programming. I want to load 6 csvs and do a few things to them before combining them later. The following code iterates over each file but only creates one dataframe, called df.

files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv')
dfs = ('df1', 'df2', 'df3', 'df4', 'df5', 'df6')
for df, file in zip(dfs, files):
    df = pd.read_csv(file)
    print(df.shape)
    print(df.dtypes)
    print(list(df))

推荐答案

我认为您认为您的代码正在执行实际上并未执行的操作.

I think you think your code is doing something that it is not actually doing.

具体来说,此行:df = pd.read_csv(file)

您可能会认为,在通过for循环的每次迭代中,都将执行并修改此行,其中df替换为dfs中的字符串,而file替换为files中的文件名.尽管后者是正确的,但前者却不是.

You might think that in each iteration through the for loop this line is being executed and modified with df being replaced with a string in dfs and file being replaced with a filename in files. While the latter is true, the former is not.

通过for循环的每次迭代都在读取一个csv文件并将其存储在变量df中,从而有效地覆盖了在上一个for循环中读取的csv文件.换句话说,for循环中的df不会被您在dfs中定义的变量名替换.

Each iteration through the for loop is reading a csv file and storing it in the variable df effectively overwriting the csv file that was read in during the previous for loop. In other words, df in your for loop is not being replaced with the variable names you defined in dfs.

此处的主要要点是,执行代码时不能替换字符串(例如'df1''df2'等)并将其用作变量名.

The key takeaway here is that strings (e.g., 'df1', 'df2', etc.) cannot be substituted and used as variable names when executing code.

获得所需结果的一种方法是将pd.read_csv()读取的每个csv文件存储在字典中,其中键是数据帧的名称(例如,'df1''df2'等),而值是pd.read_csv()返回的数据框.

One way to achieve the result you want is store each csv file read by pd.read_csv() in a dictionary, where the key is name of the dataframe (e.g., 'df1', 'df2', etc.) and value is the dataframe returned by pd.read_csv().

list_of_dfs = {}
for df, file in zip(dfs, files):
    list_of_dfs[df] = pd.read_csv(file)
    print(list_of_dfs[df].shape)
    print(list_of_dfs[df].dtypes)
    print(list(list_of_dfs[df]))

然后您可以像这样引用每个数据框:

You can then reference each of your dataframes like this:

print(list_of_dfs['df1'])
print(list_of_dfs['df2'])

您可以在此处了解有关字典的更多信息:

You can learn more about dictionaries here:

https://docs.python.org/3.6/tutorial/datastructures .html#dictionaries

这篇关于循环创建多个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆