使用 Apache PIG 读取多行 JSON [英] Multi-line JSON read using Apache PIG

查看:26
本文介绍了使用 Apache PIG 读取多行 JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 JSON 文件,想使用 Apache Pig 读取.

I have a JSON file and want to read using Apache Pig.

我尝试使用常规的 JSONLOADER,但看起来 JSONLOADER 仅适用于单行 JSON.然后我尝试了 Elephant-Bird.但我仍然无法正确看到结果.任何人都可以提出解决方案吗?

I tried using the regular JSONLOADER, but looks like JSONLOADER works only with single line JSON. Then I tried with Elephant-Bird. But I am still not able to see the results correctly. Can any one please suggest a solution?

输入:

{"employees":[                                          
         {"firstName":"John", "lastName":"Doe"},              
         {"firstName":"Anna", "lastName":"Smith"},                      
         {"firstName":"Peter", "lastName":"Jones"}             
]}      

注意:我不想将输入转换为一行.

脚本:

A = LOAD 'input' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');       
B = FOREACH A GENERATE FLATTEN($0#'employees');    
Dump B;

预期结果应该是:

([firstName#John,lastName#Doe])                                      
([firstName#Anna,lastName#Smith])                                 
([firstName#Peter,lastName#Jones])  

推荐答案

正如 siva 在评论中提到的,答案基本上是您确实需要将输入更改为一行.

As mentioned in the comments by siva, the answer is basically that you do need to change your input to a single line.

JsonLoader 或elephantbird 加载器将始终仅适用于单个线 .它不适用于多行.您需要转换您的输入在传递给猪之前到单行.一种解决方法是写一个shell 脚本并调用逻辑将多行替换为单行使用SED"命令,然后在 shell 脚本中调用 pig 脚本.此链接将帮助您如何通过 shell 脚本调用 pig.

JsonLoader or elephantbird loader will always works only with single line . It will not work with multiline. You need to convert your input to single line before passing to pig. One workaround would be write a shell script and call the logic to replace multiline to single line using 'SED' command and then call the pig script in the shell script. This link will help you how to call pig thru shell script.

这篇关于使用 Apache PIG 读取多行 JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆