JQ,Hadoop:从文件中获取命令 [英] JQ, Hadoop: taking command from a file

查看:59
本文介绍了JQ,Hadoop:从文件中获取命令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在享受JQ( Doc )提供的强大过滤器.

I have been enjoying the powerful filters provided by JQ (Doc).

Twitter的公共API提供了格式正确的json文件.我可以访问大量的资源,并且可以访问Hadoop集群.在那里,我决定不用在Pig中使用Elephantbird加载它们,而是在映射器流中尝试JQ来查看它是否更快.

Twitter's public API gives nicely formatted json files. I have access to a large amount of it, and I have access to a Hadoop cluster. There I decided to, instead of loading them in Pig using Elephantbird, try out JQ in mapper streaming to see if it is any faster.

这是我的最后一个查询:

Here is my final query:

nohup hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar\
    -files $HOME/bin/jq \
    -D mapreduce.map.memory.mb=2048\
    -D mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec \
    -mapper "./jq --raw-output 'select((.lang == \"en\") and (.entities.hashtags | length > 0)) | .entities.hashtags[] as \$tags | [.id_str, .user.id_str, .created_at, \$tags.text] | @csv'" \
    -reducer NONE \
    -input /path/to/input/*.json.gz \
    -output /path/to/output \
    &

我将我的本地jq可执行文件分发到每个计算节点,并告诉他们针对它们的stdin流使用命令来运行我的命令.

I am distributing my local jq executable to every compute node and telling them to run my command with it for their stdin stream.

查询时间很长,以至于我陷入了bashJQ中的报价和格式问题.

The query is long enough that I got into quoting and formatting issues in bash and JQ.

我希望我可以写这样的东西:

I wish I could have written something like this:

nohup hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar\
        -files $HOME/bin/jq,$PROJECT_DIR/cmd.jq \
        -D mapreduce.map.memory.mb=2048\
        -D mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec \
        -mapper "./jq --raw-output --run-cmd-file=cmd.jq" \
        -reducer NONE \
        -input /path/to/input/*.json.gz \
        -output /path/to/output \
        &

我可以将命令放在文件中,然后将其运送到计算节点并使用选项进行调用.

where I can just put my command in a file, ship it to compute nodes and call it with an option.

推荐答案

您似乎以某种方式错过了-f FILE选项!

It looks like you somehow missed the -f FILE option!

这篇关于JQ,Hadoop:从文件中获取命令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆