从数据框字典中提取数据框 [英] Extracting dataframes from a dictionary of dataframes
问题描述
我有一个包含许多csv文件的目录,这些文件已加载到数据帧字典中
I have a directory containing many csv files which I have loaded into a dictionary of dataframes
因此,仅提供3个示例小型csv文件来说明
So, just 3 sample small csv files to illustrate
import os
import csv
import pandas as pd
#create 3 small csv files for test purposes
os.chdir('c:/test')
with open('dat1990.csv','w',newline='') as fp:
a=csv.writer(fp,delimiter=',')
data = [['Stock','Sales','Year'],
['100','24','1990'],
['120','33','1990'],
['23','5','1990']]
a.writerows(data)
with open('dat1991.csv','w',newline='') as fp:
a=csv.writer(fp,delimiter=',')
data = [['Stock','Sales','Year'],
['400','35','1991'],
['450','55','1991'],
['34','6','1991']]
a.writerows(data)
with open('other1991.csv','w',newline='') as fp:
a=csv.writer(fp,delimiter=',')
data = [['Stock','Sales','Year'],
['500','56','1991'],
['600','44','1991'],
['56','55','1991']]
a.writerows(data)
创建用于将csv文件处理为数据帧的字典
create a dictionary for processing the csv files into dataframes
dfcsv_dict = {'dat1990': 'dat1990.csv', 'dat1991': 'dat1991.csv',
'other1991': 'other1991.csv'}
创建用于将csv导入到熊猫的简单导入功能
create a simple import function for importing csv to pandas
def myimport(csvfile):
return pd.read_csv(csvfile)
反复浏览字典,将所有csv文件导入到熊猫数据框
iterate through the dictionary to import all csv files into pandas dataframes
df_dict = {}
for k, v in dfcsv_dict.items():
df_dict[k] = myimport(v)
鉴于我现在在统一字典对象中可能有成千上万个数据框,如何选择其中几个并将其提取"出字典?
Given I now may have thousands of dataframes within the unified dictionary object, how can I select a few and "extract" them out of the dictionary?
例如,我将如何仅提取嵌套在字典中的这三个数据帧中的两个,就像
So for example, how would I extract just two of these three dataframes nested in the dictionary, something like
dat1990 = df_dict['dat1990']
dat1991 = df_dict['dat1991']
,但不使用文字分配.可能是字典上的某种循环结构,希望可以用一种方法来根据字典键中的字符串序列选择一个子组: 例如,所有名为 dat 或 1991 等的数据框
but without using literal assignments. Maybe some sort of looping structure over the dictionary, hopefully with a means to select a subgroup based on a string sequence in the dictionary key: eg all dataframes named dat or 1991 etc
我不想要另一个子词典",但希望将它们提取为名为独立"的数据帧,如上面的代码所示.
I don't want another "sub dictionary" but want to extract them as named "standalone" dataframes as the above code illustrates.
我正在使用python 3.5.
I am using python 3.5.
推荐答案
这是2016年1月以来的老问题,但由于没有人回答,因此这里是2019年10月的答案.可能对将来的参考很有用.
This is an old question from Jan 2016 but since no one answered, here is an answer from Oct 2019. Might be useful for future reference.
我认为您可以跳过创建数据框字典的步骤.之前,我曾写过关于如何从多个CSV文件创建单个主数据帧以及如何在主数据帧中添加一列以及从CSV文件名中提取的字符串的答案.我认为您基本上可以在这里做同样的事情.
I think you can skip the step of creating a dictionary of dataframes. I previously wrote an answer on how to create a single master dataframe from multiple CSV files, and adding a column in the master dataframe with a string extracted from the CSV filename. I think you could essentially do the same thing here.
步骤:
- 使用文件创建文件夹路径
- 在文件夹中创建文件列表
- 创建空数据框以存储CSV数据框
- 循环浏览每个csv作为数据框
- 添加一个以文件名作为字符串的列
- 将单个数据框连接到主数据框
- 使用数据框过滤器蒙版创建新的数据框
import pandas as pd
import os
# Step 1: create a path to the folder, syntax for Windows OS
path_test_folder = 'C:\\test\\'
# Step 2: create a list of CSV files in the folder
files_in_folder = os.listdir(path_test_folder)
files_in_folder = [x for x in files_in_folder if '.csv' in x]
# Step 3: create empty master dataframe to store CSV files
df_master = pd.DataFrame()
# Step 4: loop through the files in folder
for each_csv in files_in_folder:
# temporary dataframe for the CSV
path_csv = os.path.join(path_test_folder, each_csv)
temp_df = pd.read_csv(path_csv)
# add folder with filename
temp_df['str_filename'] = str(each_csv)
# combine into master dataframe
df_master = pd.concat([df_master, temp_df])
# then filter on your filenames
mask_filter = df_master['str_filename'].isin(['dat1990.csv', 'dat1991.csv'])
df_filter = df_master.loc[mask_filter]
这篇关于从数据框字典中提取数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!