如何使用apache pig加载hadoop集群上的文件？ [英] how to load files on hadoop cluster using apache pig?

查看：89 发布时间：2018/5/31 19:23:51 hadoop apache-pig

本文介绍了如何使用apache pig加载hadoop集群上的文件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个猪脚本，需要从本地hadoop集群加载文件。我可以使用hadoop命令列出文件：hadoop fs -ls / repo / mydata，`
但是当我试图在猪脚本中加载文件时，它失败了。 load语句如下：

  in = LOAD'/ repo / mydata / 2012/02'使用PigStorage（）AS事件：chararray，用户：chararray）

错误信息是：

 消息：org.apache.pig.backend.executionengine.ExecException：错误2118：输入路径不存在：file：/ repo / mydata / 2012/02

有什么想法吗？谢谢

解决方案

我的建议：

在hdfs中创建一个文件夹：hadoop fs -mkdir / pigdata
加载文件创建的hdfs文件夹： hadoop fs -put /opt/pig/tutorial/data/excite-small.log/ pigdata

（或者你可以从grunt shell作为 grunt> copyFromLocal /opt/pig/tutorial/data/excite-small.log / pigdata ）

执行pig latin脚本： grunt>在 grunt>上设置调试set job.name'first-p2-job' grunt> log = LOAD'hdfs：// hostname：54310 / pigdata / excite-small.log'AS （user：chararray，time：long，query：chararray）; grunt> grpd = GROUP log BY用户; grunt> cntd = FOREACH grpd GENERATE组，COUNT（log）; grunt> STORE cntd INTO'output'; 输出文件将存储在 hdfs：//主机名：54310 / pigdata / output

I have a pig script, and need to load files from local hadoop cluster. I can list the files using hadoop command: hadoop fs –ls /repo/mydata,` but when i tried to load files in pig script, it failed. the load statement is like this:

in = LOAD '/repo/mydata/2012/02' USING PigStorage() AS (event:chararray, user:chararray)

the error message is:

Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: file:/repo/mydata/2012/02

any idea? thanks

解决方案

My suggestion:

Create a folder in hdfs : hadoop fs -mkdir /pigdata Load the file to the created hdfs folder: hadoop fs -put /opt/pig/tutorial/data/excite-small.log /pigdata

(or you can do it from grunt shell as grunt> copyFromLocal /opt/pig/tutorial/data/excite-small.log /pigdata)

Execute the pig latin script : grunt> set debug on grunt> set job.name 'first-p2-job' grunt> log = LOAD 'hdfs://hostname:54310/pigdata/excite-small.log' AS (user:chararray, time:long, query:chararray); grunt> grpd = GROUP log BY user; grunt> cntd = FOREACH grpd GENERATE group, COUNT(log); grunt> STORE cntd INTO 'output'; The output file will be stored in hdfs://hostname:54310/pigdata/output

这篇关于如何使用apache pig加载hadoop集群上的文件？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用apache pig加载hadoop集群上的文件？ [英] how to load files on hadoop cluster using apache pig?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

如何使用apache pig加载hadoop集群上的文件？ [英] how to load files on hadoop cluster using apache pig?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭