将JSON数组加载到Pig中 [英] Load JSON array into Pig

查看:184
本文介绍了将JSON数组加载到Pig中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个json文件,格式如下:

  [
{
id: 2,
createdBy:0,
status:0,
utcTime:2014年10月14日下午4:49:47,
placeName :21 / F,坎宁安主路,Sampangi Rama NagarBengaluruKarnatakaIndia,
经度:77.5983817,
纬度:12.9832418,
createdDate:2014年9月16日2: 59:03 PM,
准确性:5,
loginType:1,
mobileNo:0000005567
},
{
id:4,
createdBy:0,
status:0,
utcTime:Oct 14,2014 4:52:48 PM,
placeName:坎宁安主路21号,Sampangi Rama NagarBengaluruKarnatakaIndia,
经度:77.5983817,
纬度:12.9832418,
createdDate: 2014年10月8日下午5:24:42,
准确性:5,
loginType:1,
mobileNo:0000005566
}

$ / code>

当我尝试将数据加载到p ig使用JsonLoader类我得到一个错误,像意外的输入结束:期望关闭标记OBJECT

  a = LOAD' /user/root/jsoneg/exp.json'使用JsonLoader('id:int,createdBy:int,status:int,utcTime:chararray,placeName:chararray,longitude:double,latitude:double,createdDate:chararray,accuracy:double ,loginType:双,mobileNo:chararray'); 
b = foreach生成$ 0,$ 1,$ 2;
dump b;


解决方案

我也遇到过类似的问题,后来我才知道Pig JSON不支持多行json格式。它总会期望json输入必须在单行中。

代替原生的Jsonloader,我建议你使用elephantbird json loader。这对Jsons格式非常有用。



您可以从以下链接下载jar包:

  http://www.java2s.com/Code/Jar/e/elephant.htm 

我将输入格式更改为单行,并通过elephantbird加载,如下所示:

  input.json 
{ test:[{id:2,createdBy:0,status:0,utcTime:2014年10月14日下午4:49:47,placeName:21 / F, Cunningham Main Rd,Sampangi Rama NagarBengaluruKarnatakaIndia,经度:77.5983817,纬度:12.9832418,createdDate:2014年9月16日下午2:59:03 PM,准确度:5,loginType:1, mobileNo:0000005567},{id:4,createdBy:0,status:0,utcTime:2014年10月14日下午4:52:48,placeName: Cunningham Main Rd,Sampangi Rama NagarBengaluruKarnatakaIndia 21 / F,经度:77.5983817,纬度:12.9832418,createdDate:2014年10月8日下午5点24分42秒,准确度:5,loginType :1,mobileNo:0000005566}]}

PigScript:
REGISTER'/ tmp / elephant-bi RD-Hadoop的COMPAT-4.1.jar;
REGISTER'/tmp/elephant-bird-pig-4.1.jar';

A = LOAD'input.json'USING com.twitter.elephantbird.pig.load.JsonLoader(' - nestedLoad');
B = FOREACH A GENERATE FLATTEN($ 0#'test');
C = FOREACH B GENERATE FLATTEN($ 0)as mymap;
D = FOREACH C GENERATE mymap#'id',mymap#'placeName',mymap#'status';
DUMP D;

产量:
(Sampangi市坎宁安主路2,21 / F Rama Nagar班加罗尔卡纳塔克邦印度0)
(Sampangi坎宁安主路4,21楼Rama Nagar班加罗尔卡纳塔克邦印度, 0)


I have a json file with the following format

[
  {
    "id": 2,
    "createdBy": 0,
    "status": 0,
    "utcTime": "Oct 14, 2014 4:49:47 PM",
    "placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia",
    "longitude": 77.5983817,
    "latitude": 12.9832418,
    "createdDate": "Sep 16, 2014 2:59:03 PM",
    "accuracy": 5,
    "loginType": 1,
    "mobileNo": "0000005567"
  },
  {
    "id": 4,
    "createdBy": 0,
    "status": 0,
    "utcTime": "Oct 14, 2014 4:52:48 PM",
    "placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia",
    "longitude": 77.5983817,
    "latitude": 12.9832418,
    "createdDate": "Oct 8, 2014 5:24:42 PM",
    "accuracy": 5,
    "loginType": 1,
    "mobileNo": "0000005566"
  }
]

when i try to load the data to pig using JsonLoader class I am getting a error like Unexpected end-of-input: expected close marker for OBJECT

a = LOAD '/user/root/jsoneg/exp.json' USING JsonLoader('id:int,createdBy:int,status:int,utcTime:chararray,placeName:chararray,longitude:double,latitude:double,createdDate:chararray,accuracy:double,loginType:double,mobileNo:chararray');
b = foreach a generate $0,$1,$2;
dump b;

解决方案

I am also faced similar kind of problem sometime back, later i came to know that Pig JSON will not support multiline json format. It will always expect the json input must be in single line.

Instead of native Jsonloader, i suggest you to use elephantbird json loader. It is pretty good for Jsons formats.

You can download the jars from the below link

http://www.java2s.com/Code/Jar/e/elephant.htm

I changed your input format to single line and loaded through elephantbird as below

input.json
{"test":[{"id": 2,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:49:47 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Sep 16, 2014 2:59:03 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005567"},{"id": 4,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:52:48 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Oct 8, 2014 5:24:42 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005566"}]}

PigScript:
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';

A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
B = FOREACH A GENERATE FLATTEN($0#'test');
C = FOREACH B GENERATE FLATTEN($0) AS mymap;
D = FOREACH C GENERATE mymap#'id',mymap#'placeName',mymap#'status';
DUMP D;

Output:
(2,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)
(4,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)

这篇关于将JSON数组加载到Pig中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆