从数据框字典中提取数据框 [英] Extracting dataframes from a dictionary of dataframes

查看：169 发布时间：2020/5/5 13:30:05 dictionary pandas

本文介绍了从数据框字典中提取数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含许多csv文件的目录，这些文件已加载到数据帧字典中

I have a directory containing many csv files which I have loaded into a dictionary of dataframes

因此，仅提供3个示例小型csv文件来说明

So, just 3 sample small csv files to illustrate

    import os
    import csv
    import pandas as pd

    #create 3 small csv files for test purposes
    os.chdir('c:/test')
    with open('dat1990.csv','w',newline='') as fp:
        a=csv.writer(fp,delimiter=',')
        data = [['Stock','Sales','Year'],
                ['100','24','1990'],
                ['120','33','1990'],
                ['23','5','1990']]
        a.writerows(data)

    with open('dat1991.csv','w',newline='') as fp:
        a=csv.writer(fp,delimiter=',')
        data = [['Stock','Sales','Year'],
                ['400','35','1991'],
                ['450','55','1991'],
                ['34','6','1991']]
        a.writerows(data)

    with open('other1991.csv','w',newline='') as fp:
        a=csv.writer(fp,delimiter=',')
        data = [['Stock','Sales','Year'],
                ['500','56','1991'],
                ['600','44','1991'],
                ['56','55','1991']]
        a.writerows(data)

创建用于将csv文件处理为数据帧的字典

create a dictionary for processing the csv files into dataframes

    dfcsv_dict = {'dat1990': 'dat1990.csv', 'dat1991': 'dat1991.csv', 
        'other1991': 'other1991.csv'}

创建用于将csv导入到熊猫的简单导入功能

create a simple import function for importing csv to pandas

    def myimport(csvfile):
        return pd.read_csv(csvfile)

反复浏览字典，将所有csv文件导入到熊猫数据框

iterate through the dictionary to import all csv files into pandas dataframes

    df_dict = {}
    for k, v in dfcsv_dict.items():
        df_dict[k] = myimport(v)

鉴于我现在在统一字典对象中可能有成千上万个数据框，如何选择其中几个并将其提取"出字典?

Given I now may have thousands of dataframes within the unified dictionary object, how can I select a few and "extract" them out of the dictionary?

例如，我将如何仅提取嵌套在字典中的这三个数据帧中的两个，就像

So for example, how would I extract just two of these three dataframes nested in the dictionary, something like

    dat1990 = df_dict['dat1990']
    dat1991 = df_dict['dat1991']

，但不使用文字分配.可能是字典上的某种循环结构，希望可以用一种方法来根据字典键中的字符串序列选择一个子组: 例如，所有名为 dat 或 1991 等的数据框

but without using literal assignments. Maybe some sort of looping structure over the dictionary, hopefully with a means to select a subgroup based on a string sequence in the dictionary key: eg all dataframes named dat or 1991 etc

我不想要另一个子词典"，但希望将它们提取为名为独立"的数据帧，如上面的代码所示.

I don't want another "sub dictionary" but want to extract them as named "standalone" dataframes as the above code illustrates.

我正在使用python 3.5.

I am using python 3.5.

推荐答案

这是2016年1月以来的老问题，但由于没有人回答，因此这里是2019年10月的答案.可能对将来的参考很有用.

This is an old question from Jan 2016 but since no one answered, here is an answer from Oct 2019. Might be useful for future reference.

我认为您可以跳过创建数据框字典的步骤.之前，我曾写过关于如何从多个CSV文件创建单个主数据帧以及如何在主数据帧中添加一列以及从CSV文件名中提取的字符串的答案.我认为您基本上可以在这里做同样的事情.

I think you can skip the step of creating a dictionary of dataframes. I previously wrote an answer on how to create a single master dataframe from multiple CSV files, and adding a column in the master dataframe with a string extracted from the CSV filename. I think you could essentially do the same thing here.

创建csv的数据框基于时间戳间隔的文件

步骤:

使用文件创建文件夹路径
在文件夹中创建文件列表
创建空数据框以存储CSV数据框
循环浏览每个csv作为数据框
添加一个以文件名作为字符串的列
将单个数据框连接到主数据框
使用数据框过滤器蒙版创建新的数据框

import pandas as pd
import os

# Step 1: create a path to the folder, syntax for Windows OS
path_test_folder = 'C:\\test\\'

# Step 2: create a list of CSV files in the folder
files_in_folder = os.listdir(path_test_folder)
files_in_folder = [x for x in files_in_folder if '.csv' in x]

# Step 3: create empty master dataframe to store CSV files
df_master = pd.DataFrame()

# Step 4: loop through the files in folder
for each_csv in files_in_folder:

    # temporary dataframe for the CSV
    path_csv = os.path.join(path_test_folder, each_csv)
    temp_df = pd.read_csv(path_csv)

    # add folder with filename
    temp_df['str_filename'] = str(each_csv)

    # combine into master dataframe
    df_master = pd.concat([df_master, temp_df])

# then filter on your filenames
mask_filter = df_master['str_filename'].isin(['dat1990.csv', 'dat1991.csv'])
df_filter = df_master.loc[mask_filter]

这篇关于从数据框字典中提取数据框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从数据框字典中提取数据框 [英] Extracting dataframes from a dictionary of dataframes

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从数据框字典中提取数据框 [英] Extracting dataframes from a dictionary of dataframes

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭