通过 add_job_flow_steps 将嵌套字典传递给 EMR [英] Passing Nested Dictionary to EMR via add_job_flow_steps

查看:24
本文介绍了通过 add_job_flow_steps 将嵌套字典传递给 EMR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用一些元数据创建了一个名为 my_dict 的 Python 字典.我通过 json.dumps() 将 my_dict 转换为字符串.my_dict 然后通过 add_job_flow_steps 作为 HadoopJarStep 中的参数传递给 EMR.

I create a python dictionary called my_dict with some metadata. I convert my_dict to a string via json.dumps(). my_dict is then passed to EMR via add_job_flow_steps as args in HadoopJarStep.

提交步骤后,用户界面中 EMR 步骤的参数部分中的参数如下所示:

Once the step has been submitted, the args look like the below in the Arguments section of the EMR step in the UI:

--my_dict "{\"level_one_key\": {\"level_two_key\": \"level_two_value\"}}"

现在,我还向 EMR 传递了一个名为 execute.py 的 python 文件来运行.上面的参数被传递到 execute.py 的 main 函数中,在那里它立即被转换成一个带有 json.loads() 的字典.它看起来像:

Now, I also pass EMR a python file to run called execute.py. The above arguments are passed into execute.py's main function, where it is immediately converted into a dictionary with json.loads(). It looks like:

parser.add_argument('--my_dict', type=json.loads, required=False)

问题: 当我传递嵌套字典时,该步骤在 20 秒内失败,原因是 UNKOWN ERROR,并且没有写入任何日志.:(

The problem: when I pass a nested dictionary, the step fails 20 seconds in with UNKOWN ERROR as the reason and no logs written whatsoever. :(

但是,当我将其作为平面字典传递时,例如:

However when I pass it as a flat dictionary, like:

--my_dict "{\"level_one_key\": \"level_one_value\"}"

效果很好.

我不想发布太多我的代码,因为这是工作相关的.但我错过了什么吗?我觉得我应该能够通过嵌套字典没问题.我还尝试在将字典传递给 main 后对其进行转换,如下所示:

I don't want to post too much of my code, because this is work related. But am I missing something? I feel like I should be able to pass a nested dictionary no problem. I have also tried converting the dictionary after it's been passed to main, like this:

parser.add_argument('--my_dict', type=str, required=False)

my_dict = json.loads(args.my_dict)

但它仍然失败.有什么想法吗?

Yet it still fails. Any ideas?

更新:当嵌套字典在execute.py中解析为str(而不是使用json.loads)后打印出来时,它看起来像:

Update: when the nested dictionary is printed out after parsing as a str (as opposed to using json.loads) in execute.py, it looks like:

{"level_one_key": {"level_two_key": "level_two_value"

无论出于何种原因,它都缺少字典的最后两个括号.显然,这是导致错误的原因,但我不知道为什么它没有将整个字典传递到 main 中.

It's missing the last two brackets of the dictionary... for whatever reason. Obviously, this is causing the error, but I don't know why it's not passing the entire dictionary into main.

推荐答案

我能够通过在未嵌套的字典末尾添加另一个键值对来解决此问题.像这样:

I was able to resolve this by adding another key value pair to the end of the dictionary that is not nested. Like this:

--my_arg "{\"level_one_key\": {\"level_two_key\": \"level_two_value\"}, \"level_one_second_key\": \"level_one_second_value\"}"

这篇关于通过 add_job_flow_steps 将嵌套字典传递给 EMR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆