PIG确切存储其关系的位置 [英] where exactly PIG stores its relations

查看:59
本文介绍了PIG确切存储其关系的位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对以下两个陈述感到非常困惑. 1)确切的LOAD语句在哪里存储此关系(学生),它在hdfs/PIG内部存储/本地计算机上吗?

i am in a big confusion with the below two statements. 1) where exactly LOAD statement stores this relation(student), is it on hdfs/PIG internal storage/local machine ???

example : student = LOAD 'HDFS:/student' using PigStorage(',');

2)如果我尝试向学生转储;则需要大约30-40秒才能显示结果,而LOAD语句则需要1-2秒.....如果我们试图从清管器内部存储中检索数据,那么为什么会出现这种延迟??

2) if i try to DUMP student; then it takes almost 30-40 sec to display result where as LOAD statement takes 1-2 sec.... if we are trying to retrieve data from pig internal storage then why is this delay ??

如果有人能消除这个疑问(最好是执行流程),将不胜感激.谢谢.

would be grateful if anyone can clear this doubts(preferably the flow of execution). thanks in adv.

我的环境:我正在使用VM进行学习.

my env: i am using VM for learning purpose.

推荐答案

LOAD不存储数据,而只是指向文件的指针. 当执行LOAD语句时,不执行任何MapReduce任务.

The LOAD does not store the data but it is just a pointer to the file. When LOAD statement is executed, no MapReduce task is executed.

仅在DUMPSTORE语句之后才启动MapReduce作业. 我们会在输出中看到我们的数据,并且可以确认数据已成功加载.

It is only after the DUMP or STORE statement that a MapReduce job is initiated. We see our data in the output and we can confirm that the data has been loaded successfully.

DUMP需要时间,因为它禁用了多查询执行并降低了执行速度. (如果出于调试目的在脚本中包含了DUMP语句,则应将其删除.)

DUMP take time as it disables multi-query execution and and slows down execution. (If you have included DUMP statements in your scripts for debugging purposes, you should remove them.)

如果要存储任何数据,则可以使用STORE命令.

If you want to store any data then can use the STORE command.

这篇关于PIG确切存储其关系的位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆