从数据框字典中提取数据框 [英] Extracting dataframes from a dictionary of dataframes

查看:169
本文介绍了从数据框字典中提取数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含许多csv文件的目录,这些文件已加载到数据帧字典中

I have a directory containing many csv files which I have loaded into a dictionary of dataframes

因此,仅提供3个示例小型csv文件来说明

So, just 3 sample small csv files to illustrate

    import os
    import csv
    import pandas as pd

    #create 3 small csv files for test purposes
    os.chdir('c:/test')
    with open('dat1990.csv','w',newline='') as fp:
        a=csv.writer(fp,delimiter=',')
        data = [['Stock','Sales','Year'],
                ['100','24','1990'],
                ['120','33','1990'],
                ['23','5','1990']]
        a.writerows(data)

    with open('dat1991.csv','w',newline='') as fp:
        a=csv.writer(fp,delimiter=',')
        data = [['Stock','Sales','Year'],
                ['400','35','1991'],
                ['450','55','1991'],
                ['34','6','1991']]
        a.writerows(data)

    with open('other1991.csv','w',newline='') as fp:
        a=csv.writer(fp,delimiter=',')
        data = [['Stock','Sales','Year'],
                ['500','56','1991'],
                ['600','44','1991'],
                ['56','55','1991']]
        a.writerows(data)

创建用于将csv文件处理为数据帧的字典

create a dictionary for processing the csv files into dataframes

    dfcsv_dict = {'dat1990': 'dat1990.csv', 'dat1991': 'dat1991.csv', 
        'other1991': 'other1991.csv'}

创建用于将csv导入到熊猫的简单导入功能

create a simple import function for importing csv to pandas

    def myimport(csvfile):
        return pd.read_csv(csvfile)

反复浏览字典,将所有csv文件导入到熊猫数据框

iterate through the dictionary to import all csv files into pandas dataframes

    df_dict = {}
    for k, v in dfcsv_dict.items():
        df_dict[k] = myimport(v)

鉴于我现在在统一字典对象中可能有成千上万个数据框,如何选择其中几个并将其提取"出字典?

Given I now may have thousands of dataframes within the unified dictionary object, how can I select a few and "extract" them out of the dictionary?

例如,我将如何仅提取嵌套在字典中的这三个数据帧中的两个,就像

So for example, how would I extract just two of these three dataframes nested in the dictionary, something like

    dat1990 = df_dict['dat1990']
    dat1991 = df_dict['dat1991']

,但不使用文字分配.可能是字典上的某种循环结构,希望可以用一种方法来根据字典键中的字符串序列选择一个子组: 例如,所有名为 dat 1991 等的数据框

but without using literal assignments. Maybe some sort of looping structure over the dictionary, hopefully with a means to select a subgroup based on a string sequence in the dictionary key: eg all dataframes named dat or 1991 etc

我不想要另一个子词典",但希望将它们提取为名为独立"的数据帧,如上面的代码所示.

I don't want another "sub dictionary" but want to extract them as named "standalone" dataframes as the above code illustrates.

我正在使用python 3.5.

I am using python 3.5.

推荐答案

这是2016年1月以来的老问题,但由于没有人回答,因此这里是2019年10月的答案.可能对将来的参考很有用.

This is an old question from Jan 2016 but since no one answered, here is an answer from Oct 2019. Might be useful for future reference.

我认为您可以跳过创建数据框字典的步骤.之前,我曾写过关于如何从多个CSV文件创建单个主数据帧以及如何在主数据帧中添加一列以及从CSV文件名中提取的字符串的答案.我认为您基本上可以在这里做同样的事情.

I think you can skip the step of creating a dictionary of dataframes. I previously wrote an answer on how to create a single master dataframe from multiple CSV files, and adding a column in the master dataframe with a string extracted from the CSV filename. I think you could essentially do the same thing here.

创建csv的数据框基于时间戳间隔的文件

步骤:

  1. 使用文件创建文件夹路径
  2. 在文件夹中创建文件列表
  3. 创建空数据框以存储CSV数据框
  4. 循环浏览每个csv作为数据框
  5. 添加一个以文件名作为字符串的列
  6. 将单个数据框连接到主数据框
  7. 使用数据框过滤器蒙版创建新的数据框

import pandas as pd
import os

# Step 1: create a path to the folder, syntax for Windows OS
path_test_folder = 'C:\\test\\'

# Step 2: create a list of CSV files in the folder
files_in_folder = os.listdir(path_test_folder)
files_in_folder = [x for x in files_in_folder if '.csv' in x]

# Step 3: create empty master dataframe to store CSV files
df_master = pd.DataFrame()

# Step 4: loop through the files in folder
for each_csv in files_in_folder:

    # temporary dataframe for the CSV
    path_csv = os.path.join(path_test_folder, each_csv)
    temp_df = pd.read_csv(path_csv)

    # add folder with filename
    temp_df['str_filename'] = str(each_csv)

    # combine into master dataframe
    df_master = pd.concat([df_master, temp_df])

# then filter on your filenames
mask_filter = df_master['str_filename'].isin(['dat1990.csv', 'dat1991.csv'])
df_filter = df_master.loc[mask_filter]

这篇关于从数据框字典中提取数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆