猪初学者的例子【意外错误】 [英] pig beginner's example [unexpected error]

查看:30
本文介绍了猪初学者的例子【意外错误】的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Linux 和 Apache Pig 的新手.我正在按照本教程学习猪:http://salsahpc.indiana.edu/ScienceCloud/pig_word_count_tutorial.htm

I am new to Linux and Apache Pig. I am following this tutorial to learn pig: http://salsahpc.indiana.edu/ScienceCloud/pig_word_count_tutorial.htm

这是一个基本的字数统计示例.数据文件'input.txt'和程序文件'wordcount.pig'在Wordcount包中,在网站上有链接.

This is a basic word counting example. The data file 'input.txt' and the program file 'wordcount.pig' are in the Wordcount package, linked on the site.

我已经在本地机器上下载了 Pig 0.11.1,以及 HadoopJava 6.

I already have Pig 0.11.1 downloaded on my local machine, as well as Hadoop, and Java 6.

当我下载 Wordcount 软件包时,它把我带到了一个tar.gz"文件.我不熟悉这种类型,也不知道如何提取它.它包含文件input.txt"、wordcount.pig"和一个自述文件.我将input.txt"保存到我的桌面.我不确定 wordcount.pig 的保存位置,决定在 shell 中逐行输入命令.

When I downloaded the Wordcount package it took me to a "tar.gz" file. I am unfamiliar with this type, and wasn't sure how to extract it. It contains the files 'input.txt','wordcount.pig' and a Readme file. I saved 'input.txt' to my Desktop. I wasn't sure where to save wordcount.pig, and decided to just type in the commands line by line in the shell.

我在本地模式下运行 pig 如下:pig -x local

I ran pig in local mode as follows:pig -x local

然后我只是在 grunt> 提示符下复制粘贴 wordcount.pig 脚本的每一行,如下所示:

and then I just copy-pasted each line of the wordcount.pig script at the grunt> prompt like this:

A = load '/home/me/Desktop/input.txt';

B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;

C = group B by word;

D = foreach C generate COUNT(B), group;

转储 D;

这会产生以下错误:...

This generates the following errors: ...

Retrying connect to server: localhost/127.0.0.1:8021. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

 ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2043: Unexpected error during execution.

我的问题:

1.我应该将input.txt"和原始wordcount.pig"脚本保存到目录 pig-0.11.1 内的某个特殊文件夹中吗?也就是说,在 pig-0.11.1 中创建一个名为 word 的文件夹并将wordcount.pig"和input.txt"放在那里,然后在 grunt> 提示符下输入wordcount.pig"???一般来说,如果我有数据说,'dat.txt',而脚本说,'program.pig',我应该在哪里保存它们以从 grunt shell 运行 'program.pig' ???我认为他们都应该进入 pig-0.11.1,所以我可以做 $ pig -x local wordcount.pig,但我不确定.

1. Should I be saving 'input.txt' and the original 'wordcount.pig' script to some special folder inside the directory pig-0.11.1? That is, create a folder called word inside pig-0.11.1 and put 'wordcount.pig' and 'input.txt' there and then type in "wordcount.pig" from the grunt> prompt ??? In general, if I have data in say, 'dat.txt', and a script say, 'program.pig', where should I be saving them to run 'program.pig' from the grunt shell??? I think they should both go in pig-0.11.1,so I can do $ pig -x local wordcount.pig, but I am not sure.

2.为什么我不能像我尝试的那样逐行运行脚本?我已经在加载语句中指定了文件input.txt"的位置.那么为什么它不只是逐行运行命令并将 D 的内容转储到我的屏幕上???

2. Why am I not able to run the script line by line as I tried to? I have specified the location of the file 'input.txt' in the load statement. So why does it not just run the commands line by line and dump the contents of D to my screen???

3.当我尝试使用 $pig 在 mapreduce 模式下运行 Pig 时,会出现以下错误:

3. When I try to run Pig in mapreduce mode using $pig, it gives this error:

重试策略是 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)2013-06-03 23:57:06,956 [main] 错误 org.apache.pig.Main - 错误 2999:意外的内部错误.未能创建数据存储

retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-06-03 23:57:06,956 [main] ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Failed to create DataStorage

推荐答案

此错误表示 Pig 无法连接到 Hadoop 以运行作业.你说你已经下载了 Hadoop——你安装了吗?如果你已经安装了它,你是否根据它的文档启动了它——你是否运行了 bin/start-all.sh 脚本?使用 -x local 告诉 Pig 使用本地文件系统而不是 HDFS,但它仍然需要一个正在运行的 Hadoop 实例来执行执行.在尝试运行 Pig 之前,请按照 Hadoop 文档设置您的本地集群",并确保您的 NameNodeDataNode 等已启动并正在运行.

This error indicates that Pig is unable to connect to Hadoop to run the job. You say you have downloaded Hadoop -- have you installed it? If you have installed it, have you started it up according to its docs -- have you run the bin/start-all.sh script? Using -x local tells Pig to use the local filesystem instead of HDFS, but it still needs a running Hadoop instance to perform the execution. Before trying to run Pig, follow the Hadoop docs to get your local "cluster" set up and make sure your NameNode, DataNodes, etc. are up and running.

这篇关于猪初学者的例子【意外错误】的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆