如何为每个CSV文件创建单独的Pandas DataFrame,并为其赋予有意义的名称? [英] How to create separate Pandas DataFrames for each CSV file and give them meaningful names?
问题描述
我已经进行了彻底的搜索,但找不到关于该问题的指导,因此希望这个问题不会多余.我有几个.csv文件代表栅格图像.我想对它们进行一些统计分析,所以我试图为每个文件创建一个熊猫数据框,以便我可以对'em dice'em进行切片并绘制'em ...但是我在遍历文件列表时遇到了麻烦为每个文件创建一个具有有意义名称的DF.
I've searched thoroughly and can't quite find the guidance I am looking for on this issue so I hope this question is not redundant. I have several .csv files that represent raster images. I'd like to perform some statistical analysis on them so I am trying to create a Pandas dataframe for each file so I can slice 'em dice 'em and plot 'em...but I am having trouble looping through the list of files to create a DF with a meaningful name for each file.
这是我到目前为止所拥有的:
Here is what I have so far:
import glob
import os
from pandas import *
#list of .csv files
#I'd like to turn each file into a dataframe
dataList = glob.glob(r'C:\Users\Charlie\Desktop\Qvik\textRasters\*.csv')
#name that I'd like to use for each data frame
nameList = []
for raster in dataList:
path_list = raster.split(os.sep)
name = path_list[6][:-4]
nameList.append(name)
#zip these lists into a dict
dataDct = {}
for k, v in zip(nameList,dataList):
dataDct[k] = dataDct.get(k,"") + v
dataDct
所以现在我有一个字典,其中的关键是我想要的每个数据帧的名称,而值是read_csv(path)的路径:
So now I have a dict where the key is the name I want for each dataframe and the value is the path for read_csv(path):
{'Aspect': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Aspect.csv',
'Curvature': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Curvature.csv',
'NormalZ': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\NormalZ.csv',
'Slope': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Slope.csv',
'SnowDepth': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\SnowDepth.csv',
'Vegetation': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Vegetation.csv',
'Z': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Z.csv'}
我的本能是尝试这种变化:
My instinct was to try variations of this:
for k, v in dataDct.iteritems():
k = read_csv(v)
但是我只剩下一个数据帧'k',其中填充了循环读取的最后一个文件中的数据.
but that leaves me with a single dataframe, 'k' , that is filled with data from the last file read in by the loop.
我可能在这里错过了一些基本知识,但是我开始对此感到困惑,所以我以为我会问所有...任何想法都受到赞赏!
I'm probably missing something fundamental here but I am starting to spin my wheels on this so I'd thought I'd ask y'all...any ideas are appreciated!
干杯.
推荐答案
您是否要在字典中分别获取所有数据帧,每个键一个数据帧?如果是这样,这将使您拥有显示的字典,但每个键中都有来自的数据.
Are you trying to get all of the data frames separately in a dictionary, one data frame per key? If so, this will leave you with the dict you showed but instead will have the data from in each key.
dataDct = {}
for k, v in zip(nameList,dataList):
dataDct[k] = read_csv(v)
现在,您可以执行以下操作:
So now, you could do this for example:
dataDct['SnowDepth'][['cola','colb']].plot()
这篇关于如何为每个CSV文件创建单独的Pandas DataFrame,并为其赋予有意义的名称?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!