如何迭代多个数据框以在每个python中选择一列? [英] How can I iterate through multiple dataframes to select a column in each in python?
问题描述
对于我的项目,我在一个csv文件中读取美国各州的数据。我的函数将其中的每一个转换为单独的数据框,因为我需要对每个州的信息执行操作。
def RanktoDF(csvFile) :
df = pd.read_csv(csvFile)
df = df [pd.notnull(df ['Index'])]#删除所有空值
df = df [df.Index! ='Index'] #Drop所有额外的标题
df = df.set_index('State')#设置状态为索引
返回df
我将此函数应用于我的每个文件,并使用数组中的名称返回df varNames
用于名称,zip(glob.glob('*。csv'),varNames):
vars()[Crime+ s] = RanktoDF(name)
所有这些都是完美的。
我的问题是,我也想创建一个Dataframe,它由这些状态数据框中的每一列组成。
我尝试过遍历列表的数据框,并选择我想要添加到新数据框的列(总体):
dfNewIndex = pd.DataFrame(index = CrimeRank_1980_df .index)#创建新的DF,索引
在dfList中的名称:#dfList是我的数据框列表。看到图像
newIndex = name ['Population']
dfNewIndex.append(newIndex)
#dfNewIndex = pd.concat([dfNewIndex,dfList [name ['Population'] ],axis = 1)
我的错误总是相同的,告诉我名字被视为字符串而不是实际的数据帧
TypeError Traceback(最近的最后一次调用)
< ipython-input-30- 5aa85b0174df>在< module>()
3
4 dfList中的名称:
----> 5 newIndex = name ['Index']
6 dfNewIndex.append(newIndex)
7#dfNewIndex = pd.concat([dfNewIndex,dfList [name ['Population']],axis = 1)
TypeError:字符串索引必须是整数
我明白我的列表是列表的字符串而不是变量/数据框,所以我的问题是我如何纠正我的代码,以便能够做我想要的或有更简单的方法做这个?
我查找的任何解决方案都给出了明确键入数据框以便连接的答案,但是我有50个,所以它有点不可行。任何帮助将不胜感激。
一种方法是索引到vars(),例如
dfList中的名称:
或者,我认为将数据框存储在容器中并将其迭代更为简单,例如
newIndex = vars()[name] [人口]
$($ * code $ frame $ {
) [犯罪+ s] = RanktoDF(名称)
在框架中的名称:
newIndex = frames [name] [人口]
For my project I'm reading in a csv file with data from every State in the US. My function converts each of these into a separate Dataframe as I need to perform operations on each State's information.
def RanktoDF(csvFile):
df = pd.read_csv(csvFile)
df = df[pd.notnull(df['Index'])] # drop all null values
df = df[df.Index != 'Index'] #Drop all extra headers
df= df.set_index('State') #Set State as index
return df
I apply this function to every one of my files and return the df with a name from my array varNames
for name , s in zip (glob.glob('*.csv'), varNames):
vars()["Crime" + s] = RanktoDF(name)
All of that works perfectly. My problem is that I also want to create a Dataframe thats made up of one column from each of those State Dataframes.
I have tried iterating through a list of my dataframes and selecting the column (population) i want to append it to a new Dataframe:
dfNewIndex = pd.DataFrame(index=CrimeRank_1980_df.index) # Create new DF with Index
for name in dfList: #dfList is my list of dataframes. See image
newIndex = name['Population']
dfNewIndex.append(newIndex)
#dfNewIndex = pd.concat([dfNewIndex, dfList[name['Population']], axis=1)
My error is always the same which tells me that name is viewed as a string rather than an actual Dataframe
TypeError Traceback (most recent call last)
<ipython-input-30-5aa85b0174df> in <module>()
3
4 for name in dfList:
----> 5 newIndex = name['Index']
6 dfNewIndex.append(newIndex)
7 # dfNewIndex = pd.concat([dfNewIndex, dfList[name['Population']], axis=1)
TypeError: string indices must be integers
I understand that my list is a list of Strings rather than variables/dataframes so my question is how can i correct my code to be able to do what i want or is there an easier way of doing this?
Any solutions I've looked up have given answers where the dataframes are explicitly typed in order to be concatenated but I have 50 so its a little unfeasible. Any help would be appreciated.
One way would be to index into vars(), e.g.
for name in dfList:
newIndex = vars()[name]["Population"]
Alternatively I think it would be neater to store your dataframes in a container and iterate through that, e.g.
frames = {}
for name, s in zip(glob.glob('*.csv'), varNames):
frames["Crime" + s] = RanktoDF(name)
for name in frames:
newIndex = frames[name]["Population"]
这篇关于如何迭代多个数据框以在每个python中选择一列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!