PIG确切存储其关系的位置 [英] where exactly PIG stores its relations
问题描述
我对以下两个陈述感到非常困惑. 1)确切的LOAD语句在哪里存储此关系(学生),它在hdfs/PIG内部存储/本地计算机上吗?
i am in a big confusion with the below two statements. 1) where exactly LOAD statement stores this relation(student), is it on hdfs/PIG internal storage/local machine ???
example : student = LOAD 'HDFS:/student' using PigStorage(',');
2)如果我尝试向学生转储;则需要大约30-40秒才能显示结果,而LOAD语句则需要1-2秒.....如果我们试图从清管器内部存储中检索数据,那么为什么会出现这种延迟??
2) if i try to DUMP student; then it takes almost 30-40 sec to display result where as LOAD statement takes 1-2 sec.... if we are trying to retrieve data from pig internal storage then why is this delay ??
如果有人能消除这个疑问(最好是执行流程),将不胜感激.谢谢.
would be grateful if anyone can clear this doubts(preferably the flow of execution). thanks in adv.
我的环境:我正在使用VM进行学习.
my env: i am using VM for learning purpose.
推荐答案
LOAD
不存储数据,而只是指向文件的指针.
当执行LOAD
语句时,不执行任何MapReduce
任务.
The LOAD
does not store the data but it is just a pointer to the file.
When LOAD
statement is executed, no MapReduce
task is executed.
仅在DUMP
或STORE
语句之后才启动MapReduce
作业.
我们会在输出中看到我们的数据,并且可以确认数据已成功加载.
It is only after the DUMP
or STORE
statement that a MapReduce
job is initiated.
We see our data in the output and we can confirm that the data has been loaded successfully.
DUMP
需要时间,因为它禁用了多查询执行并降低了执行速度. (如果出于调试目的在脚本中包含了DUMP
语句,则应将其删除.)
DUMP
take time as it disables multi-query execution and and slows down execution. (If you have included DUMP
statements in your scripts for debugging purposes, you should remove them.)
如果要存储任何数据,则可以使用STORE
命令.
If you want to store any data then can use the STORE
command.
这篇关于PIG确切存储其关系的位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!