通过 add_job_flow_steps 将嵌套字典传递给 EMR [英] Passing Nested Dictionary to EMR via add_job_flow_steps
问题描述
我用一些元数据创建了一个名为 my_dict 的 Python 字典.我通过 json.dumps()
将 my_dict 转换为字符串.my_dict 然后通过 add_job_flow_steps
作为 HadoopJarStep 中的参数传递给 EMR.
I create a python dictionary called my_dict with some metadata. I convert my_dict to a string via json.dumps()
. my_dict is then passed to EMR via add_job_flow_steps
as args in HadoopJarStep.
提交步骤后,用户界面中 EMR 步骤的参数部分中的参数如下所示:
Once the step has been submitted, the args look like the below in the Arguments section of the EMR step in the UI:
--my_dict "{\"level_one_key\": {\"level_two_key\": \"level_two_value\"}}"
现在,我还向 EMR 传递了一个名为 execute.py
的 python 文件来运行.上面的参数被传递到 execute.py
的 main 函数中,在那里它立即被转换成一个带有 json.loads()
的字典.它看起来像:
Now, I also pass EMR a python file to run called execute.py
. The above arguments are passed into execute.py
's main function, where it is immediately converted into a dictionary with json.loads()
. It looks like:
parser.add_argument('--my_dict', type=json.loads, required=False)
问题: 当我传递嵌套字典时,该步骤在 20 秒内失败,原因是 UNKOWN ERROR,并且没有写入任何日志.:(
The problem: when I pass a nested dictionary, the step fails 20 seconds in with UNKOWN ERROR as the reason and no logs written whatsoever. :(
但是,当我将其作为平面字典传递时,例如:
However when I pass it as a flat dictionary, like:
--my_dict "{\"level_one_key\": \"level_one_value\"}"
效果很好.
我不想发布太多我的代码,因为这是工作相关的.但我错过了什么吗?我觉得我应该能够通过嵌套字典没问题.我还尝试在将字典传递给 main 后对其进行转换,如下所示:
I don't want to post too much of my code, because this is work related. But am I missing something? I feel like I should be able to pass a nested dictionary no problem. I have also tried converting the dictionary after it's been passed to main, like this:
parser.add_argument('--my_dict', type=str, required=False)
my_dict = json.loads(args.my_dict)
但它仍然失败.有什么想法吗?
Yet it still fails. Any ideas?
更新:当嵌套字典在execute.py中解析为str(而不是使用json.loads)后打印出来时,它看起来像:
Update: when the nested dictionary is printed out after parsing as a str (as opposed to using json.loads) in execute.py, it looks like:
{"level_one_key": {"level_two_key": "level_two_value"
无论出于何种原因,它都缺少字典的最后两个括号.显然,这是导致错误的原因,但我不知道为什么它没有将整个字典传递到 main 中.
It's missing the last two brackets of the dictionary... for whatever reason. Obviously, this is causing the error, but I don't know why it's not passing the entire dictionary into main.
推荐答案
我能够通过在未嵌套的字典末尾添加另一个键值对来解决此问题.像这样:
I was able to resolve this by adding another key value pair to the end of the dictionary that is not nested. Like this:
--my_arg "{\"level_one_key\": {\"level_two_key\": \"level_two_value\"}, \"level_one_second_key\": \"level_one_second_value\"}"
这篇关于通过 add_job_flow_steps 将嵌套字典传递给 EMR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!