每年在Apache Pig中获得最大价值 [英] Get MAX value per year in Apache Pig

查看:71
本文介绍了每年在Apache Pig中获得最大价值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试使用以下数据获取每年的最高温度.实际数据看起来像这样,但我只对第一列(即年份)和第四列(即温度)感兴趣.

I have been trying to get the max temperature per year using the data below. Actual data looks like this but I am interested in only first column that is year and 4th column that is temperature..

2016-11-03 12:00:00.000 +0100,Mostly Cloudy,rain,10.594444444444443,10.594444444444443,0.73,13.2664,174.0,10.1913,0.0,1019.74,Partly cloudy throughout the day.
2016-11-03 13:00:00.000 +0100,Mostly Cloudy,rain,11.072222222222223,11.072222222222223,0.72,13.1698,176.0,12.4131,0.0,1019.45,Partly cloudy throughout the day.
2016-11-03 14:00:00.000 +0100,Mostly Cloudy,rain,11.172222222222222,11.172222222222222,0.71,12.654600000000002,175.0,10.835300000000002,0.0,1019.16,Partly cloudy throughout the day.
2016-11-03 15:00:00.000 +0100,Mostly Cloudy,rain,10.911111111111111,10.911111111111111,0.72,11.753,170.0,10.867500000000001,0.0,1018.94,Partly cloudy throughout the day.
2016-11-03 16:00:00.000 +0100,Mostly Cloudy,rain,10.350000000000001,10.350000000000001,0.72,10.6582,161.0,11.592,0.0,1018.81,Partly cloudy throughout the day.


DUMP B is like below
(2014,12.038889)
(2014,21.055555) 
(2016,29.905556)
(2016,30.605556)
(2016,29.95)
(2016,29.972221)

我写的代码如下..但是,它使我在D处出错.我也使用了ToDate函数,但似乎也行不通..

The code I have write is like below..But, it throws me the error at D. I have also used ToDate function but seems it doesn't work too..

A = load 'file.csv' using PigStorage(',')......
B = foreach A GENERATE SUBSTRING(year,0,4) as year1, Atemp
C = group B by year1;  
D = foreach C GENERATE group,MAX(Atemp);  

我得到错误:

Invalid field projection. Projected field [year1] does not exist in schema: group:chararray,B:bag{:tuple(year1:chararray,Atemp:float)}.

推荐答案

我在stackoverflow上发布问题后弄清楚了自己:)我不知道为什么!代替D = foreach C生成组,MAX(Atemp);我已经使用D = foreach C GENERATE组,将MAX(B.Atemp)设置为max;而且有效!

I figure out myself after posting question at stackoverflow :) I wonder why! Instead of D = foreach C GENERATE group,MAX(Atemp); I have used D= foreach C GENERATE group, MAX(B.Atemp) as max; and it works!

如果有人要我删除帖子,我很乐意删除.请让我知道

If anyone wants me to delete the post I'm happy to do so. Kindly let me know

这篇关于每年在Apache Pig中获得最大价值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆