使用Hadoop配置Pig关系 [英] Configuring pig relation with Hadoop

查看:171
本文介绍了使用Hadoop配置Pig关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在理解Hadoop和Pig之间的关系时遇到了麻烦. 我了解Pig的目的是将MapReduce模式隐藏在脚本语言Pig Latin后面.

I'm having troubles understanding the relation between Hadoop and Pig. I understand Pig's purpose is to hide the MapReduce pattern behind a scripting language, Pig Latin.

我不明白Hadoop和Pig是如何链接的.到目前为止,唯一的安装过程似乎是假定pig与主hadoop节点在同一台计算机上运行. 实际上,它使用hadoop配置文件.

What I don't understand is how Hadoop and Pig are linked. So far, the only installation procedures seem to assume that pig is run on the same machine as the main hadoop node. Indeed, it uses the hadoop configuration files.

这是因为Pig只将脚本转换为mapreduce代码并将其发送给hadoop吗?

Is this because pig only translates the scripts into mapreduce code and send them to hadoop ?

如果是这样,我如何配置Pig以便使其将脚本发送到远程服务器?

If that's the case, how could I configure Pig in order to make it send the scripts to a distant server ?

如果不是,这是否意味着我们总是需要在猪中运行hadoop?

If not, does it mean we always need to have hadoop running within pig ?

推荐答案

猪可以在两种模式下运行:

Pig can run in two modes:

  1. 本地模式.在这种模式下,完全不使用Hadoop集群.所有进程都在单个JVM中运行,并且文件是从本地文件系统读取的.要在本地模式下运行Pig,请使用以下命令:

  1. Local mode. In this mode Hadoop cluster is not used at all. All processes run in single JVM and files are read from the local filesystem. To run Pig in local mode, use the command:

pig -x local 

  • MapReduce模式.在此模式下,Pig将脚本转换为MapReduce作业,然后在Hadoop群集上运行它们.这是默认模式.

  • MapReduce Mode. In this mode Pig converts scripts to MapReduce jobs and run them on Hadoop cluster. It is the default mode.

    集群可以是本地的也可以是远程的. Pig使用HADOOP_MAPRED_HOME环境变量在本地计算机上查找Hadoop安装(请参见

    Cluster can be local or remote. Pig uses the HADOOP_MAPRED_HOME environment variable to find Hadoop installation on local machine (see Installing Pig).

    如果要连接到远程集群,则应在pig.properties文件中指定集群参数. MRv1的示例:

    If you want to connect to remote cluster, you should specify cluster parameters in the pig.properties file. Example for MRv1:

    fs.default.name=hdfs://namenode_address:8020/
    mapred.job.tracker=jobtracker_address:8021
    

    您还可以在命令行中指定远程群集地址:

    You can also specify remote cluster address at the command line:

    pig -fs namenode_address:8020 -jt jobtracker_address:8021
    

  • 因此,您可以将Pig安装到任何计算机上并连接到远程群集. Pig包含Hadoop客户端,因此您无需安装Hadoop即可使用Pig.

    Hence, you can install Pig to any machine and connect to remote cluster. Pig includes Hadoop client, therefore you don't have to install Hadoop to use Pig.

    这篇关于使用Hadoop配置Pig关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆