Python变量范围方法 [英] Python variable scope approach

查看:75
本文介绍了Python变量范围方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前有这个python代码(我正在使用Apache Spark,但可以肯定的是,这个问题无关紧要.)

I currently have this python code (I'm using Apache Spark, but pretty sure that it doesn't matter for this question).

import numpy as np
import pandas as pd
from sklearn import feature_extraction
from sklearn import tree
from pyspark import SparkConf, SparkContext

## Module Constants
APP_NAME = "My Spark Application"
df = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")

def train_tree():
    # Do more stuff with the data, call other functions
    pass

def main(sc):
    cat_columns = ["Sex", "Pclass"]

    # PROBLEM IS HERE
    cat_dict = df[cat_columns].to_dict(orient='records')

    vec = feature_extraction.DictVectorizer()
    cat_vector = vec.fit_transform(cat_dict).toarray()

    df_vector = pd.DataFrame(cat_vector)
    vector_columns = vec.get_feature_names()
    df_vector.columns = vector_columns
    df_vector.index = df.index

    # train data

    df = df.drop(cat_columns, axis=1)
    df = df.join(df_vector)

    train_tree()

if __name__ == "__main__":
    # Configure Spark    
    conf = SparkConf().setAppName(APP_NAME)
    conf = conf.setMaster("local[*]")
    sc   = SparkContext(conf=conf)

    # Execute Main functionality
    main(sc)

当我运行它时,出现错误:cat_dict = df [cat_columns] .to_dict(orient ='records')UnboundLocalError:赋值之前引用了本地变量'df'

When I run it, I get the error: cat_dict = df[cat_columns].to_dict(orient='records') UnboundLocalError: local variable 'df' referenced before assignment

我感到困惑,因为我在文件顶部的 main 函数作用域之外定义了变量df.为什么在函数中使用此变量会触发此错误?我还尝试过将 df 变量定义放在 if __name__ =="__main __":语句中(在调用 main 函数之前)

I find this puzzling because I am defining the variable df outside of the main function scope at the top of the file. Why would using this variable inside the function trigger this error? I have also tried putting the df variable definition inside the if __name__ == "__main__": statement (before the main function is called)

现在,显然有很多方法可以解决此问题,但这更多是关于帮助我更好地理解Python.所以我想问:

Now, obviously there are lots of ways I could solve this, but this is more about helping me to understand Python better. So I want to ask:

a)为什么还会出现此错误?

a) Why this error even occurs?

b)鉴于以下情况,如何最好地解决它:-我不想将 df 定义放在 main 函数中,因为我想在其他函数中访问它.-我不想上课-我不想使用全局变量-我不想在函数参数中传递 df

b) How best to solve it given that: - I don't want to put the df definition inside the main function because I want to access it in other functions. - I don't want to use a class - I don't want to use a global variable - I don't want to pass df around in function parameters

推荐答案

我认为有必要将评论总结成详细的答案,以供将来阅读此问题的读者使用.

I think it's worth summarizing the comments into an a detailed answer for future readers of this question.

之所以在这里引发UnboundLocalError的原因是由于Python函数作用域的工作方式.尽管我的 df 变量是在最大范围的 main 函数之外定义的,但是尝试在 main 函数中重新分配它会导致错误.这个绝妙的答案很好地说明了,释义:

The reason why the UnboundLocalError is getting thrown here is due to the way Python function scope works. Although my df variable is defined outside of the main function at the uppermost scope, attempting to re-assign it in the main function creates the error. This excellent answer puts it nicely, to paraphrase:

现在,我们进入 df = df.drop(cat_columns,axis = 1),当Python扫描该行时,它说:啊,有一个名为 df 的变量,我将其放入本地范围字典中."然后,当它在赋值右侧为 df 寻找 df 的值时,它将找到名为<的 local 变量.code> df (尚无值),因此会引发错误.

Now we get to df = df.drop(cat_columns, axis=1) When Python scans that line, it says "ahah, there's a variable named df, I'll put it into my local scope dictionary." Then when it goes looking for a value for df for the df on the right hand side of the assignment, it finds its local variable named df, which has no value yet, and so throws the error.

要修复我的代码,我进行了以下更改:

To fix my code I made the following change:

def main(sc):

    cat_columns = ["Sex", "Pclass", "SibSp"]
    cat_dict = df[cat_columns].to_dict(orient='records')

    vec = feature_extraction.DictVectorizer()
    cat_vector = vec.fit_transform(cat_dict).toarray()

    df_vector = pd.DataFrame(cat_vector)
    vector_columns = vec.get_feature_names()
    df_vector.columns = vector_columns
    df_vector.index = df.index

    # train data

    df_updated = df.drop(cat_columns, axis=1) # This used to be df = df.drop(cat_columns, axis=1) 
    df_updated = df_updated.join(df_vector)

    train_tree(df_updated) # passing the df_updated to the function

这将删除UnboundLocalError.为了在其他函数中继续使用 df 变量,我将其作为参数传递(尽管名称不同).这可能会造成混淆,因此,如@Padraic Cunningham所建议的那样,您可以在 main 函数中传递变量:

This removes the UnboundLocalError. To keep using the df variable in other functions, I pass it in as a parameter (albeit with a different name). This could get confusing, so as suggested by @Padraic Cunningham you could pass the variable in the main function:

if __name__ == "__main__":
    # Configure Spark

    conf = SparkConf().setAppName(APP_NAME)
    conf = conf.setMaster("local[*]")
    sc   = SparkContext(conf=conf)
    df = pd.read_csv("train.csv")
    test = pd.read_csv("test.csv")

    # df.Age = df.Age.astype(int)
    # test.Age = test.Age.astype(int)

    # Execute Main functionality
    main(sc,df)

其他选项是使用类或使用全局变量.我觉得这两个选择太过分了(一个类)或太过优雅(一个全局).但是,这纯粹是我的个人喜好.

Other options would be to use a class, or to use a global variable. I felt that these two options were overkill (a class) or inelegant (global). However, this is purely my personal taste.

这篇关于Python变量范围方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆