Pig ORDER 命令失败 [英] Pig ORDER command fails

查看:16
本文介绍了Pig ORDER 命令失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试分析 apache 日志,目标是找出所有用户代理及其使用百分比.当结果包含每个用户代理、计数和百分比时,以下程序可以正常工作.当尝试根据最常用的排序时,程序在最后一行失败.有人可以帮忙吗?

I am trying to analyze an apache log and the goal is the find out all user agents and their percentage in usage. The following program works fine to the line when result contains each useragent, count and percentage. The program fails at last line when tries to order according to most used. Could someone help?

logs = LOAD '$LOGS' USING ApacheCombinedLogLoader AS (remoteHost, hyphen, user, time, method, uri, protocol, statusCode, responseSize, referer, userAgent);

uarows = FOREACH logs GENERATE userAgent;
total = FOREACH (GROUP uarows ALL) GENERATE COUNT(uarows) as count;
dump total; 

gpuarows = GROUP uarows BY userAgent;
result = FOREACH gpuarows {
       subtotal = COUNT(uarows);
       GENERATE flatten(group) as ua, subtotal AS SUB_TOTAL, 100*(double)subtotal/(double)total.count AS percentage;
       };
orderresult = ORDER result BY SUB_TOTAL DESC;
dump orderresult;

奇怪的是转储结果"工作得很好,所以是 ORDER 行出了问题

what's weird is that 'dump result' works just fine, so it's the ORDER line makes trouble

错误:

013-04-13 11:33:09,976 [Thread-48] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2013-04-13 11:33:09,976 [Thread-48] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2013-04-13 11:33:09,995 [Thread-48] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0005
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:177)
    at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:124)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131)
    ... 6 more
2013-04-13 11:33:10,276 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0005
2013-04-13 11:33:10,276 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases orderresult
2013-04-13 11:33:10,276 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: orderresult[16,14] C:  R: 
2013-04-13 11:33:15,286 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2013-04-13 11:33:15,286 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0005 has failed! Stop running all dependent jobs
2013-04-13 11:33:15,287 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-04-13 11:33:15,287 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2013-04-13 11:33:15,288 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion  UserId  StartedAt   FinishedAt  Features
1.0.4   0.11.0  dliu    2013-04-13 11:32:27 2013-04-13 11:33:15 GROUP_BY,ORDER_BY

Some jobs have failed! Stop running all dependent jobs

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime  MinMapTIme  AvgMapTime  MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime    Alias   Feature Outputs
job_local_0002  1   1   n/a n/a n/a n/a n/a n/a 1-18,logs,total,uarows  MULTI_QUERY,COMBINER    
job_local_0003  1   1   n/a n/a n/a n/a n/a n/a gpuarows,result GROUP_BY,COMBINER   
job_local_0004  1   1   n/a n/a n/a n/a n/a n/a orderresult SAMPLER 

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_local_0005  orderresult ORDER_BY    Message: Job failed! Error - NA file:/tmp/temp265162785/tmp896004388,

Input(s):
Successfully read 0 records from: "file:///home/dliu/ApacheLogAnalysisWithPig/access.log"

Output(s):
Failed to produce result in "file:/tmp/temp265162785/tmp896004388"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_local_0002  ->  job_local_0003,
job_local_0003  ->  job_local_0004,
job_local_0004  ->  job_local_0005,
job_local_0005


2013-04-13 11:33:15,291 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2013-04-13 11:33:15,297 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias orderresult
Details at logfile: /home/dliu/ApacheLogAnalysisWithPig/pig_1365823931459.log

推荐答案

确保两件事:

1) 在本地模式下运行 pig: pig -x local2) 将 PIG_HOME 或 PIG_INSTALL 环境变量设置为指向 pig 安装目录

1) Run pig in local mode: pig -x local 2) Set either PIG_HOME or PIG_INSTALL environment variable to point to pig installation directory

这篇关于Pig ORDER 命令失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆