按名称从字典中提取数据框 [英] Extract dataframe from dictionary by name

查看:102
本文介绍了按名称从字典中提取数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我做了一个循环,在其中循环访问文件夹中的(csv)文件,将它们读入数据帧字典,并以csv文件命名(例如,file1.csv变为file1_df).我对数据做了一些工作并生成新的行,然后尝试将我的数据框的一部分子集到一个新的数据框(file1_df2)中.我想稍后在字典之外引用这些数据框.

I have made a loop where I iterate over (csv) files in a folder, read them into a dictionary of dataframes and name them after the csv file (e.g. file1.csv becomes file1_df). I do some work on the data and generate new rows, then I try to subset part of my dataframes into a new dataframe (file1_df2). I would like to later reference these dataframes outside of the dictionary.

    df_dict = {}
    for file in os.listdir(datadir):  # Loop over the files in that folder (only has CSV files)
        df_name = file[:-4] + '_df'  # Trim off .csv to name the dataframe
        df_dict[df_name] = pd.read_csv(os.path.join(datadir, file))

是否可以通过名称引用这些数据框?所以以后我可以直接叫file1_df2而不是df_dict["file1_df2"]吗?

Is it possible to reference these dataframes by name? So later I can just call file1_df2 instead of df_dict["file1_df2"]?

本质上,我要问的是与此处相同的问题.看来他也没有得到这个答案,所以我认为这可能是不可能的,但是我还没有找到一个明确表示不是的答案.

In essence I am asking the same question as here. It doesn't look like he got this answered either, so I think this might not be possible, but I have yet to find an answer that explicitly says it isn't.

我知道这在SAS和Stata等语言中是可能的,但是我从来没有想过如何在Python中做到这一点.在这些语言中,您可以将占位符变量直接插入名称中.

I know this is possible in languages like SAS and Stata, but I have never figured out how to do it in Python. In those languages, you can plug your placeholder variable directly into the name of something.

/* In SAS */
%let param = test1
libname path "C:\User\&param."

proc sql;
create &param._df as 
select * from path.&param.
quit;

/* In Stata */
foreach i in file1 file2 {
    import delimited "`i'.csv", clear
    save "`i'.dta", replace
}

等如果这不可能,我想肯定地知道这一点.谢谢!

etc. If this is not possible, I would like to know that with certainty. Thank you!

推荐答案

缺少答案的原因很可能是因为没人能真正告诉您为什么要这样做.这个问题似乎源于将SAS/Stata工作流程应用于python根本没有任何意义.

The lack of answers is likely because nobody can really tell WHY you want to do this. The question seems to stem from applying an SAS / Stata workflow to python that just doesn't make any sense.

但是,我认为这符合您的要求

However, i think this does what you're asking

import pandas as pd
my_csvs = ["name1.csv", "name2.csv", "name3.csv"]
my_dfs = [pd.read_csv(csv) for csv in my_csvs]
df_dict = {name.replace(".csv", "_df"): df for name, df in zip(my_csvs, my_dfs)}

# access dataframes with (advisable to use this method!)
csv2 = df_dict["name2_df"]

然后,我们可以通过exec()调用将这些键添加到我们的名称空间中:

Then, we can add these keys to our name space with an exec() call:

# now add them to the namespace
for k in df_dict.keys():
    exec(f"{k} = df_dict['{k}']")
    # or use "{k} = df_dict['{k}']".format(k=k) for python < 3.5?

# Now does this work?
print(name2_df)

这确实有效.但是,任何IDE都将标记最后一行,因为它似乎没有声明该变量.

And this actually does work. However, any IDE is going to flag the last line, because it doesn't seem like you've declared that variable.

这篇关于按名称从字典中提取数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆