PIG 存储其关系的确切位置 [英] where exactly PIG stores its relations

查看:28
本文介绍了PIG 存储其关系的确切位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对以下两个陈述感到非常困惑.1) LOAD 语句存储此关系的确切位置(学生),是在 hdfs/PIG 内部存储/本地机器上吗???

i am in a big confusion with the below two statements. 1) where exactly LOAD statement stores this relation(student), is it on hdfs/PIG internal storage/local machine ???

example : student = LOAD 'HDFS:/student' using PigStorage(',');

2) 如果我试图甩掉学生;那么显示结果需要将近 30-40 秒,而 LOAD 语句需要 1-2 秒......如果我们试图从猪内部存储中检索数据,那么为什么会出现这种延迟??

2) if i try to DUMP student; then it takes almost 30-40 sec to display result where as LOAD statement takes 1-2 sec.... if we are trying to retrieve data from pig internal storage then why is this delay ??

如果有人能解决这个疑问(最好是执行流程),我们将不胜感激.致谢.

would be grateful if anyone can clear this doubts(preferably the flow of execution). thanks in adv.

我的环境:我使用 VM 进行学习.

my env: i am using VM for learning purpose.

推荐答案

LOAD 不存储数据而只是指向文件的指针.执行LOAD语句时,不执行MapReduce任务.

The LOAD does not store the data but it is just a pointer to the file. When LOAD statement is executed, no MapReduce task is executed.

只有在 DUMPSTORE 语句之后才会启动 MapReduce 作业.我们在输出中看到我们的数据,我们可以确认数据已成功加载.

It is only after the DUMP or STORE statement that a MapReduce job is initiated. We see our data in the output and we can confirm that the data has been loaded successfully.

DUMP 需要时间,因为它会禁用多查询执行并减慢执行速度.(如果出于调试目的在脚本中包含了 DUMP 语句,则应删除它们.)

DUMP take time as it disables multi-query execution and and slows down execution. (If you have included DUMP statements in your scripts for debugging purposes, you should remove them.)

如果你想存储任何数据,那么可以使用 STORE 命令.

If you want to store any data then can use the STORE command.

这篇关于PIG 存储其关系的确切位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆