使用 Hadoop 配置猪关系 [英] Configuring pig relation with Hadoop

查看:36
本文介绍了使用 Hadoop 配置猪关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法理解 Hadoop 和 Pig 之间的关系.我了解 Pig 的目的是将 MapReduce 模式隐藏在脚本语言 Pig Latin 后面.

I'm having troubles understanding the relation between Hadoop and Pig. I understand Pig's purpose is to hide the MapReduce pattern behind a scripting language, Pig Latin.

我不明白的是 Hadoop 和 Pig 是如何链接的.到目前为止,唯一的安装程序似乎假设 pig 与主 hadoop 节点在同一台机器上运行.实际上,它使用了 hadoop 配置文件.

What I don't understand is how Hadoop and Pig are linked. So far, the only installation procedures seem to assume that pig is run on the same machine as the main hadoop node. Indeed, it uses the hadoop configuration files.

这是因为 pig 只将脚本转换为 mapreduce 代码并将它们发送到 hadoop 吗?

Is this because pig only translates the scripts into mapreduce code and send them to hadoop ?

如果是这样,我如何配置 Pig 以使其将脚本发送到远程服务器?

If that's the case, how could I configure Pig in order to make it send the scripts to a distant server ?

如果不是,是否意味着我们总是需要在 pig 中运行 hadoop?

If not, does it mean we always need to have hadoop running within pig ?

推荐答案

Pig 可以在两种模式下运行:

Pig can run in two modes:

  1. 本地模式.在这种模式下,根本不使用 Hadoop 集群.所有进程都在单个 JVM 中运行,并且文件是从本地文件系统读取的.要在本地模式下运行 Pig,请使用以下命令:

  1. Local mode. In this mode Hadoop cluster is not used at all. All processes run in single JVM and files are read from the local filesystem. To run Pig in local mode, use the command:

pig -x local 

  • MapReduce 模式.在这种模式下,Pig 将脚本转换为 MapReduce 作业并在 Hadoop 集群上运行它们.这是默认模式.

  • MapReduce Mode. In this mode Pig converts scripts to MapReduce jobs and run them on Hadoop cluster. It is the default mode.

    集群可以是本地的或远程的.Pig 使用 HADOOP_MAPRED_HOME 环境变量在本地机器上查找 Hadoop 安装(参见 安装 Pig).

    Cluster can be local or remote. Pig uses the HADOOP_MAPRED_HOME environment variable to find Hadoop installation on local machine (see Installing Pig).

    如果你想连接到远程集群,你应该在 pig.properties 文件中指定集群参数.MRv1 示例:

    If you want to connect to remote cluster, you should specify cluster parameters in the pig.properties file. Example for MRv1:

    fs.default.name=hdfs://namenode_address:8020/
    mapred.job.tracker=jobtracker_address:8021
    

    您也可以在命令行指定远程集群地址:

    You can also specify remote cluster address at the command line:

    pig -fs namenode_address:8020 -jt jobtracker_address:8021
    

  • 因此,您可以将 Pig 安装到任何机器并连接到远程集群.Pig 包含 Hadoop 客户端,因此您无需安装 Hadoop 即可使用 Pig.

    Hence, you can install Pig to any machine and connect to remote cluster. Pig includes Hadoop client, therefore you don't have to install Hadoop to use Pig.

    这篇关于使用 Hadoop 配置猪关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆