使用Apache PIG读取多行JSON [英] Multi-line JSON read using Apache PIG

查看:121
本文介绍了使用Apache PIG读取多行JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个JSON文件,并希望使用Apache Pig读取。



我尝试使用常规的 JSONLOADER ,但看起来像 JSONLOADER 只能使用单行JSON。然后我试着用 Elephant-Bird 。但是我仍然无法正确看到结果。任何人都可以提出建议解决方案吗?



输入:

  { 雇员:[
{firstName:John,lastName:Doe},
{firstName:Anna,lastName:Smith},
{firstName:Peter,lastName:Jones}
]}

注意:我不想将输入转换为单行。



脚本:

  A = LOAD'input'使用com.twitter.elephantbird.pig.load.JsonLoader(' -  nestedLoad'); 
B = FOREACH A GENERATE FLATTEN($ 0#'employees');
转储B;

预期结果应该是:

<$ p $ ([名字#彼得,姓氏#琼斯])$($) b $ b


解决方案

正如siva的评论中所提到的,答案基本上您需要将输入内容更改为单行。


JsonLoader或elephantbird loader始终只能使用单个
行。它不适用于多行。在传递给猪之前,您需要将输入
转换为单行。一种解决方法是编写一个
的shell脚本,并使用'SED'命令调用逻辑来替换多行到单行
,然后在shell脚本中调用猪脚本。
此链接将帮助您如何通过shell脚本调用猪。



I have a JSON file and want to read using Apache Pig.

I tried using the regular JSONLOADER, but looks like JSONLOADER works only with single line JSON. Then I tried with Elephant-Bird. But I am still not able to see the results correctly. Can any one please suggest a solution?

Input :

{"employees":[                                          
         {"firstName":"John", "lastName":"Doe"},              
         {"firstName":"Anna", "lastName":"Smith"},                      
         {"firstName":"Peter", "lastName":"Jones"}             
]}      

Note : I dont want to convert the input in to a single line.

Script:

A = LOAD 'input' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');       
B = FOREACH A GENERATE FLATTEN($0#'employees');    
Dump B;

Expected result should be :

([firstName#John,lastName#Doe])                                      
([firstName#Anna,lastName#Smith])                                 
([firstName#Peter,lastName#Jones])  

解决方案

As mentioned in the comments by siva, the answer is basically that you do need to change your input to a single line.

JsonLoader or elephantbird loader will always works only with single line . It will not work with multiline. You need to convert your input to single line before passing to pig. One workaround would be write a shell script and call the logic to replace multiline to single line using 'SED' command and then call the pig script in the shell script. This link will help you how to call pig thru shell script.

这篇关于使用Apache PIG读取多行JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆