PIG 存储其关系的确切位置 [英] where exactly PIG stores its relations
问题描述
我对以下两个陈述感到非常困惑.1) LOAD 语句存储此关系的确切位置(学生),是在 hdfs/PIG 内部存储/本地机器上吗???
i am in a big confusion with the below two statements. 1) where exactly LOAD statement stores this relation(student), is it on hdfs/PIG internal storage/local machine ???
example : student = LOAD 'HDFS:/student' using PigStorage(',');
2) 如果我试图甩掉学生;那么显示结果需要将近 30-40 秒,而 LOAD 语句需要 1-2 秒......如果我们试图从猪内部存储中检索数据,那么为什么会出现这种延迟??
2) if i try to DUMP student; then it takes almost 30-40 sec to display result where as LOAD statement takes 1-2 sec.... if we are trying to retrieve data from pig internal storage then why is this delay ??
如果有人能解决这个疑问(最好是执行流程),我们将不胜感激.致谢.
would be grateful if anyone can clear this doubts(preferably the flow of execution). thanks in adv.
我的环境:我使用 VM 进行学习.
my env: i am using VM for learning purpose.
推荐答案
LOAD
不存储数据而只是指向文件的指针.执行LOAD
语句时,不执行MapReduce
任务.
The LOAD
does not store the data but it is just a pointer to the file.
When LOAD
statement is executed, no MapReduce
task is executed.
只有在 DUMP
或 STORE
语句之后才会启动 MapReduce
作业.我们在输出中看到我们的数据,我们可以确认数据已成功加载.
It is only after the DUMP
or STORE
statement that a MapReduce
job is initiated.
We see our data in the output and we can confirm that the data has been loaded successfully.
DUMP
需要时间,因为它会禁用多查询执行并减慢执行速度.(如果出于调试目的在脚本中包含了 DUMP
语句,则应删除它们.)
DUMP
take time as it disables multi-query execution and and slows down execution. (If you have included DUMP
statements in your scripts for debugging purposes, you should remove them.)
如果你想存储任何数据,那么可以使用 STORE
命令.
If you want to store any data then can use the STORE
command.
这篇关于PIG 存储其关系的确切位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!