使用 pandas 迭代地将列添加到数据框 [英] Using Pandas to Iteratively Add Columns to a Dataframe

查看:60
本文介绍了使用 pandas 迭代地将列添加到数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力编写一些相对简单的代码.我有一个已读入数据框的CSV文件. CSV是面板数据(即每行的唯一公司和年份观察值).我有两列要在其上执行功能,然后根据该功能的输出创建新变量.

I have some relatively simple code that I'm struggling to put together. I have a CSV that I've read into a dataframe. The CSV is panel data (i.e., unique company and year observations for each row). I have two columns that I want to perform a function on and then I want to create new variables based on the output of the function.

这是我到目前为止使用的代码:

Here's what I have so far with code:

#Loop through rows in a CSV file
for index, rows in df.iterrows():
    #Start at column 6 and go to the end of the file
    for row in rows[6:]:
        data = perform_function1( row )
        output =  perform_function2(data)    
        df.ix[index, 'new_variable'] = output
        print output

我希望这段代码从第6列开始迭代,然后再到文件末尾(例如,我有两列要在Column6和Column7上执行该功能),然后根据该功能创建新列被执行(例如,Output6和Output7).上面的代码返回Column7的输出,但是我不知道如何创建一个变量来允许我捕获两列的输出(即一个不会被循环覆盖的新变量).我搜索了Stackoverflow,却没有发现与我的问题直接相关的任何内容(也许是因为我的菜鸟太大了?).非常感谢您的帮助.

I want this code to iterate starting in column 6 and then going to the end of the file (e.g., I have two columns I want to perform the function on Column6 and Column7) and then create new columns based on the functions that were performed (e.g., Output6 and Output7). The code above returns the output for Column7, but I can't figure out how to create a variable that allows me to capture the outputs from both columns (i.e., a new variable that isn't overwritten by loop). I searched Stackoverflow and didn't see anything that immediately related to my question (maybe because I'm too big of a noob?). I would really appreciate your help.

谢谢

TT

P.S.我不确定是否提供了足够的细节.如果需要提供更多信息,请告诉我.

P.S. I'm not sure if I've provided enough detail. Please let me know if I need to provide more.

推荐答案

迭代操作不会利用Pandas的功能.熊猫的优势在于在整个数据帧中有效地应用操作,而不是逐行迭代.对于这样的任务非常有用,在该任务中您希望跨数据链接一些功能.您应该能够在一行中完成整个任务.

Operating iteratively doesn't take advantage of Pandas' capabilities. Pandas' strength is in applying operations efficiently across the whole dataframe, rather than in iterating row by row. It's great for a task like this where you want to chain a few functions across your data. You should be able to accomplish your whole task in a single line.

df["new_variable"] = df.ix[6:].apply(perform_function1).apply(perform_function2)

perform_function1将应用于每一行,而perform_function2将应用于第一个函数的结果.

perform_function1 will be applied to each row, and perform_function2 will be applied to the results of the first function.

这篇关于使用 pandas 迭代地将列添加到数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆