JQ,Hadoop:从文件中获取命令 [英] JQ, Hadoop: taking command from a file
问题描述
我一直在享受JQ
( Doc )提供的强大过滤器.
I have been enjoying the powerful filters provided by JQ
(Doc).
Twitter的公共API提供了格式正确的json文件.我可以访问大量的资源,并且可以访问Hadoop集群.在那里,我决定不用在Pig
中使用Elephantbird
加载它们,而是在映射器流中尝试JQ
来查看它是否更快.
Twitter's public API gives nicely formatted json files. I have access to a large amount of it, and I have access to a Hadoop cluster. There I decided to, instead of loading them in Pig
using Elephantbird
, try out JQ
in mapper streaming to see if it is any faster.
这是我的最后一个查询:
Here is my final query:
nohup hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar\
-files $HOME/bin/jq \
-D mapreduce.map.memory.mb=2048\
-D mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec \
-mapper "./jq --raw-output 'select((.lang == \"en\") and (.entities.hashtags | length > 0)) | .entities.hashtags[] as \$tags | [.id_str, .user.id_str, .created_at, \$tags.text] | @csv'" \
-reducer NONE \
-input /path/to/input/*.json.gz \
-output /path/to/output \
&
我将我的本地jq
可执行文件分发到每个计算节点,并告诉他们针对它们的stdin
流使用命令来运行我的命令.
I am distributing my local jq
executable to every compute node and telling them to run my command with it for their stdin
stream.
查询时间很长,以至于我陷入了bash
和JQ
中的报价和格式问题.
The query is long enough that I got into quoting and formatting issues in bash
and JQ
.
我希望我可以写这样的东西:
I wish I could have written something like this:
nohup hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar\
-files $HOME/bin/jq,$PROJECT_DIR/cmd.jq \
-D mapreduce.map.memory.mb=2048\
-D mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec \
-mapper "./jq --raw-output --run-cmd-file=cmd.jq" \
-reducer NONE \
-input /path/to/input/*.json.gz \
-output /path/to/output \
&
我可以将命令放在文件中,然后将其运送到计算节点并使用选项进行调用.
where I can just put my command in a file, ship it to compute nodes and call it with an option.
推荐答案
您似乎以某种方式错过了-f FILE
选项!
It looks like you somehow missed the -f FILE
option!
这篇关于JQ,Hadoop:从文件中获取命令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!