如何将参数传递给Hive中的Python流式处理脚本? [英] How to pass parameters to Python streaming script in Hive?
问题描述
添加文件replace-nan-with-zeros.py ;
SELECT
TRANSFORM(...)
USING'python replace-nan-with-zeros.py'
AS(...)
FROM some_table;
我有一个简单的Python脚本:
#!/ usr / bin / env python
import sys
kFirstColumns = 7
def main (argv):
用于sys.stdin中的行:
line = line.strip();
inputs = line.split('\t')
#用零替代NaN
outputs = []
columnIndex = 1;
用于输入值:
newValue =值
如果columnIndex> kFirstColumns:
newValue = value.replace('NaN','0.0')
outputs.append(newValue)
columnIndex = columnIndex + 1
print'\\ \\ t'.join(输出)
如果__name__ ==__main__:
main(sys.argv [1:])
如何使 kFirstColumns 成为此Python脚本的命令行或其他类型的参数? 解决方案非常简单。使用 而不仅仅是 它适用于我。 Python脚本应改为: Hive user can stream table through script to transform that data: I have a simple Python script: How to make kFirstColumns to be a command-line or some other kind of parameter to this Python script? Thank you! Solution is really trivial. Use instead of just It works fine for me. Python script should be changed to:
这篇关于如何将参数传递给Hive中的Python流式处理脚本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
添加文件replace-nan-with-zeros.py;
SELECT
TRANSFORM(...)
USING'python replace-nan-with-zeros.py 7'
AS(...)
FROM some_table;
...
USING'python replace-nan-with-zeros.py'
...
kFirstColumns = int(sys.argv [1])$ b $ b
ADD FILE replace-nan-with-zeros.py;
SELECT
TRANSFORM (...)
USING 'python replace-nan-with-zeros.py'
AS (...)
FROM some_table;
#!/usr/bin/env python
import sys
kFirstColumns= 7
def main(argv):
for line in sys.stdin:
line = line.strip();
inputs = line.split('\t')
# replace NaNs with zeros
outputs = [ ]
columnIndex = 1;
for value in inputs:
newValue = value
if columnIndex > kFirstColumns:
newValue = value.replace('NaN','0.0')
outputs.append(newValue)
columnIndex = columnIndex + 1
print '\t'.join(outputs)
if __name__ == "__main__":
main(sys.argv[1:])
ADD FILE replace-nan-with-zeros.py;
SELECT
TRANSFORM (...)
USING 'python replace-nan-with-zeros.py 7'
AS (...)
FROM some_table;
...
USING 'python replace-nan-with-zeros.py'
...
kFirstColumns= int(sys.argv[1])