将 JSON 数组加载到 Pig [英] Load JSON array into Pig
问题描述
我有一个格式如下的 json 文件
<预><代码>[{身份证":2,"createdBy": 0,状态":0,"utcTime": "2014 年 10 月 14 日下午 4:49:47","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia",经度":77.5983817,纬度":12.9832418,"createdDate": "2014 年 9 月 16 日下午 2:59:03",准确度":5,登录类型":1,手机号":0000005567"},{身份证":4,"createdBy": 0,状态":0,"utcTime": "2014 年 10 月 14 日下午 4:52:48","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia",经度":77.5983817,纬度":12.9832418,"createdDate": "2014 年 10 月 8 日下午 5:24:42",准确度":5,登录类型":1,手机号":0000005566"}]当我尝试使用 JsonLoader 类将数据加载到 Pig 时,我收到一个错误,如 Unexpected end-of-input: expected close marker for OBJECT
a = LOAD '/user/root/jsoneg/exp.json' USING JsonLoader('id:int,createdBy:int,status:int,utcTime:chararray,placeName:chararray,longitude:double,latitude:double,createdDate:chararray,accuracy:double,loginType:double,mobileNo:chararray');b = foreach a 生成 $0,$1,$2;转储 b;
前段时间我也遇到过类似的问题,后来我才知道 Pig JSON 不支持多行 json 格式.它总是期望 json 输入必须在单行中.
我建议您不要使用本机 Jsonloader,我建议您使用大象鸟 json 加载器.非常适合 Jsons 格式.
您可以从以下链接下载罐子
http://www.java2s.com/Code/Jar/e/elephant.htm
我将您的输入格式更改为单行并通过如下所示的大象鸟加载
input.json{"test":[{"id": 2,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:49:47 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Sep 16, 2014 2:59:03 PM","Type:"1log","mobileNo": "0000005567"},{"id": 4,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:52:48 PM","placeName":"21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Oct 8, 2014 5:24:42 PM",","accyloginType": 1,"mobileNo": "0000005566"}]}猪脚本:注册'/tmp/elephant-bird-hadoop-compat-4.1.jar';注册'/tmp/elephant-bird-pig-4.1.jar';A = LOAD 'input.json' 使用 com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');B = FOREACH A GENERATE FLATTEN($0#'test');C = FOREACH B GENERATE FLATTEN($0) 作为 mymap;D = FOREACH C GENERATE mymap#'id',mymap#'placeName',mymap#'status';转储 D;输出:(2,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)(4,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)
I have a json file with the following format
[
{
"id": 2,
"createdBy": 0,
"status": 0,
"utcTime": "Oct 14, 2014 4:49:47 PM",
"placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia",
"longitude": 77.5983817,
"latitude": 12.9832418,
"createdDate": "Sep 16, 2014 2:59:03 PM",
"accuracy": 5,
"loginType": 1,
"mobileNo": "0000005567"
},
{
"id": 4,
"createdBy": 0,
"status": 0,
"utcTime": "Oct 14, 2014 4:52:48 PM",
"placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia",
"longitude": 77.5983817,
"latitude": 12.9832418,
"createdDate": "Oct 8, 2014 5:24:42 PM",
"accuracy": 5,
"loginType": 1,
"mobileNo": "0000005566"
}
]
when i try to load the data to pig using JsonLoader class I am getting a error like Unexpected end-of-input: expected close marker for OBJECT
a = LOAD '/user/root/jsoneg/exp.json' USING JsonLoader('id:int,createdBy:int,status:int,utcTime:chararray,placeName:chararray,longitude:double,latitude:double,createdDate:chararray,accuracy:double,loginType:double,mobileNo:chararray');
b = foreach a generate $0,$1,$2;
dump b;
I am also faced similar kind of problem sometime back, later i came to know that Pig JSON will not support multiline json format. It will always expect the json input must be in single line.
Instead of native Jsonloader, i suggest you to use elephantbird json loader. It is pretty good for Jsons formats.
You can download the jars from the below link
http://www.java2s.com/Code/Jar/e/elephant.htm
I changed your input format to single line and loaded through elephantbird as below
input.json
{"test":[{"id": 2,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:49:47 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Sep 16, 2014 2:59:03 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005567"},{"id": 4,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:52:48 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Oct 8, 2014 5:24:42 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005566"}]}
PigScript:
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';
A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
B = FOREACH A GENERATE FLATTEN($0#'test');
C = FOREACH B GENERATE FLATTEN($0) AS mymap;
D = FOREACH C GENERATE mymap#'id',mymap#'placeName',mymap#'status';
DUMP D;
Output:
(2,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)
(4,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)
这篇关于将 JSON 数组加载到 Pig的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!