如何迭代多个数据框以在每个python中选择一列? [英] How can I iterate through multiple dataframes to select a column in each in python?

查看:146
本文介绍了如何迭代多个数据框以在每个python中选择一列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我的项目,我在一个csv文件中读取美国各州的数据。我的函数将其中的每一个转换为单独的数据框,因为我需要对每个州的信息执行操作。

  def RanktoDF(csvFile) :
df = pd.read_csv(csvFile)
df = df [pd.notnull(df ['Index'])]#删除所有空值
df = df [df.Index! ='Index'] #Drop所有额外的标题
df = df.set_index('State')#设置状态为索引
返回df

我将此函数应用于我的每个文件,并使用数组中的名称返回df varNames

 用于名称,zip(glob.glob('*。csv'),varNames):
vars()[Crime+ s] = RanktoDF(name)

所有这些都是完美的。
我的问题是,我也想创建一个Dataframe,它由这些状态数据框中的每一列组成。



我尝试过遍历列表的数据框,并选择我想要添加到新数据框的列(总体):



dfList

  dfNewIndex = pd.DataFrame(index = CrimeRank_1980_df .index)#创建新的DF,索引


在dfList中的名称:#dfList是我的数据框列表。看到图像
newIndex = name ['Population']
dfNewIndex.append(newIndex)

#dfNewIndex = pd.concat([dfNewIndex,dfList [name ['Population'] ],axis = 1)

我的错误总是相同的,告诉我名字被视为字符串而不是实际的数据帧

  TypeError Traceback(最近的最后一次调用)
< ipython-input-30- 5aa85b0174df>在< module>()
3
4 dfList中的名称:
----> 5 newIndex = name ['Index']
6 dfNewIndex.append(newIndex)
7#dfNewIndex = pd.concat([dfNewIndex,dfList [name ['Population']],axis = 1)

TypeError:字符串索引必须是整数

我明白我的列表是列表的字符串而不是变量/数据框,所以我的问题是我如何纠正我的代码,以便能够做我想要的或有更简单的方法做这个?



我查找的任何解决方案都给出了明确键入数据框以便连接的答案,但是我有50个,所以它有点不可行。任何帮助将不胜感激。

解决方案

一种方法是索引到vars(),例如

  dfList中的名称:
newIndex = vars()[name] [人口]
或者,我认为将数据框存储在容器中并将其迭代更为简单,例如



$($ * code $ frame $ {

) [犯罪+ s] = RanktoDF(名称)

在框架中的名称:
newIndex = frames [name] [人口]


For my project I'm reading in a csv file with data from every State in the US. My function converts each of these into a separate Dataframe as I need to perform operations on each State's information.

def RanktoDF(csvFile):
    df = pd.read_csv(csvFile)
    df = df[pd.notnull(df['Index'])] # drop all null values
    df = df[df.Index != 'Index'] #Drop all extra headers
    df= df.set_index('State') #Set State as index
    return df

I apply this function to every one of my files and return the df with a name from my array varNames

for name , s in zip (glob.glob('*.csv'), varNames):
    vars()["Crime" + s] = RanktoDF(name)

All of that works perfectly. My problem is that I also want to create a Dataframe thats made up of one column from each of those State Dataframes.

I have tried iterating through a list of my dataframes and selecting the column (population) i want to append it to a new Dataframe:

dfList

dfNewIndex = pd.DataFrame(index=CrimeRank_1980_df.index) # Create new DF with Index


for name in dfList:  #dfList is my list of dataframes. See image
    newIndex = name['Population']
    dfNewIndex.append(newIndex)

    #dfNewIndex = pd.concat([dfNewIndex, dfList[name['Population']], axis=1)

My error is always the same which tells me that name is viewed as a string rather than an actual Dataframe

TypeError                                 Traceback (most recent call last)
<ipython-input-30-5aa85b0174df> in <module>()
      3 
      4 for name in dfList:
----> 5     newIndex = name['Index']
      6     dfNewIndex.append(newIndex)
      7 #     dfNewIndex = pd.concat([dfNewIndex, dfList[name['Population']], axis=1)

TypeError: string indices must be integers

I understand that my list is a list of Strings rather than variables/dataframes so my question is how can i correct my code to be able to do what i want or is there an easier way of doing this?

Any solutions I've looked up have given answers where the dataframes are explicitly typed in order to be concatenated but I have 50 so its a little unfeasible. Any help would be appreciated.

解决方案

One way would be to index into vars(), e.g.

for name in dfList:
    newIndex = vars()[name]["Population"]

Alternatively I think it would be neater to store your dataframes in a container and iterate through that, e.g.

frames = {}

for name, s in zip(glob.glob('*.csv'), varNames):
    frames["Crime" + s] = RanktoDF(name)

for name in frames:
    newIndex = frames[name]["Population"]

这篇关于如何迭代多个数据框以在每个python中选择一列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆