如何使用 apache pig 在 hadoop 集群上加载文件? [英] how to load files on hadoop cluster using apache pig?

查看：37 发布时间：2021/11/12 4:04:07 hadoop apache-pig

本文介绍了如何使用 apache pig 在 hadoop 集群上加载文件?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 pig 脚本，需要从本地 hadoop 集群加载文件.我可以使用 hadoop 命令列出文件:hadoop fs –ls/repo/mydata,`但是当我尝试在猪脚本中加载文件时，它失败了.加载语句是这样的:

I have a pig script, and need to load files from local hadoop cluster. I can list the files using hadoop command: hadoop fs –ls /repo/mydata,` but when i tried to load files in pig script, it failed. the load statement is like this:

in = LOAD '/repo/mydata/2012/02' USING PigStorage() AS (event:chararray, user:chararray)

错误信息是:

Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: file:/repo/mydata/2012/02

有什么想法吗?谢谢

推荐答案

我的建议:

在hdfs中创建文件夹:hadoop fs -mkdir/pigdata

加载文件到创建的hdfs文件夹:hadoop fs -put/opt/pig/tutorial/data/excite-small.log/pigdata

Load the file to the created hdfs folder: hadoop fs -put /opt/pig/tutorial/data/excite-small.log /pigdata

(或者你可以从 grunt shell 中执行 grunt> copyFromLocal/opt/pig/tutorial/data/excite-small.log/pigdata)

(or you can do it from grunt shell as grunt> copyFromLocal /opt/pig/tutorial/data/excite-small.log /pigdata)

执行猪拉丁脚本:

Execute the pig latin script :

   grunt> set debug on

   grunt> set job.name 'first-p2-job'

   grunt> log = LOAD 'hdfs://hostname:54310/pigdata/excite-small.log' AS 
              (user:chararray, time:long, query:chararray); 
   grunt> grpd = GROUP log BY user; 
   grunt> cntd = FOREACH grpd GENERATE group, COUNT(log); 
   grunt> STORE cntd INTO 'output';

输出文件将存储在hdfs://hostname:54310/pigdata/output

这篇关于如何使用 apache pig 在 hadoop 集群上加载文件?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用 apache pig 在 hadoop 集群上加载文件? [英] how to load files on hadoop cluster using apache pig?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用 apache pig 在 hadoop 集群上加载文件? [英] how to load files on hadoop cluster using apache pig?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭