按名称从字典中提取数据框 [英] Extract dataframe from dictionary by name
问题描述
我做了一个循环,在其中循环访问文件夹中的(csv)文件,将它们读入数据帧字典,并以csv文件命名(例如,file1.csv变为file1_df).我对数据做了一些工作并生成新的行,然后尝试将我的数据框的一部分子集到一个新的数据框(file1_df2)中.我想稍后在字典之外引用这些数据框.
I have made a loop where I iterate over (csv) files in a folder, read them into a dictionary of dataframes and name them after the csv file (e.g. file1.csv becomes file1_df). I do some work on the data and generate new rows, then I try to subset part of my dataframes into a new dataframe (file1_df2). I would like to later reference these dataframes outside of the dictionary.
df_dict = {}
for file in os.listdir(datadir): # Loop over the files in that folder (only has CSV files)
df_name = file[:-4] + '_df' # Trim off .csv to name the dataframe
df_dict[df_name] = pd.read_csv(os.path.join(datadir, file))
是否可以通过名称引用这些数据框?所以以后我可以直接叫file1_df2
而不是df_dict["file1_df2"]
吗?
Is it possible to reference these dataframes by name? So later I can just call file1_df2
instead of df_dict["file1_df2"]
?
本质上,我要问的是与此处相同的问题.看来他也没有得到这个答案,所以我认为这可能是不可能的,但是我还没有找到一个明确表示不是的答案.
In essence I am asking the same question as here. It doesn't look like he got this answered either, so I think this might not be possible, but I have yet to find an answer that explicitly says it isn't.
我知道这在SAS和Stata等语言中是可能的,但是我从来没有想过如何在Python中做到这一点.在这些语言中,您可以将占位符变量直接插入名称中.
I know this is possible in languages like SAS and Stata, but I have never figured out how to do it in Python. In those languages, you can plug your placeholder variable directly into the name of something.
/* In SAS */
%let param = test1
libname path "C:\User\¶m."
proc sql;
create ¶m._df as
select * from path.¶m.
quit;
/* In Stata */
foreach i in file1 file2 {
import delimited "`i'.csv", clear
save "`i'.dta", replace
}
等如果这不可能,我想肯定地知道这一点.谢谢!
etc. If this is not possible, I would like to know that with certainty. Thank you!
推荐答案
缺少答案的原因很可能是因为没人能真正告诉您为什么要这样做.这个问题似乎源于将SAS/Stata工作流程应用于python根本没有任何意义.
The lack of answers is likely because nobody can really tell WHY you want to do this. The question seems to stem from applying an SAS / Stata workflow to python that just doesn't make any sense.
但是,我认为这符合您的要求
However, i think this does what you're asking
import pandas as pd
my_csvs = ["name1.csv", "name2.csv", "name3.csv"]
my_dfs = [pd.read_csv(csv) for csv in my_csvs]
df_dict = {name.replace(".csv", "_df"): df for name, df in zip(my_csvs, my_dfs)}
# access dataframes with (advisable to use this method!)
csv2 = df_dict["name2_df"]
然后,我们可以通过exec()
调用将这些键添加到我们的名称空间中:
Then, we can add these keys to our name space with an exec()
call:
# now add them to the namespace
for k in df_dict.keys():
exec(f"{k} = df_dict['{k}']")
# or use "{k} = df_dict['{k}']".format(k=k) for python < 3.5?
# Now does this work?
print(name2_df)
这确实有效.但是,任何IDE都将标记最后一行,因为它似乎没有声明该变量.
And this actually does work. However, any IDE is going to flag the last line, because it doesn't seem like you've declared that variable.
这篇关于按名称从字典中提取数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!