PIG中的Elephant-Bird UDF中的JSON数组字段处理 [英] JSON Array field handling in Elephant-Bird UDF in PIG

查看：84 发布时间：2020/9/3 20:12:52 json hadoop user-defined-functions apache-pig

本文介绍了PIG中的Elephant-Bird UDF中的JSON数组字段处理的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有关PIG中JSON处理的快速问题.

A quick question on the JSON handling in PIG.

我尝试了一些名为Elephant-Bird的JsonLoader来加载和处理JSON数据，如下所示:

I tried some JsonLoader called Elephant-Bird to load and handle JSON data like the followings:

{
   "SV":1,
   "AD":[
      {
         "ID":"46931606",
         "C1":"46",
         "C2":"469",
         "ST":"46931",
         "PO":1
      },
      {
         "ID":"46721489",
         "C1":"46",
         "C2":"467",
         "ST":"46721",
         "PO":5
      }
   ]
}

该加载器对于简单字段运行良好，但不适用于任何数组字段.我不知道如何使用此UDF或以任何其他方式访问数组(上面的"AD"字段中)的元素?请指教.

The loader works well for simple fields but it doesn't work well for any array field. I don't know how I can access elements in the array ("AD" field above) with this UDF or in any other way? Please advise.

推荐答案

您应使用-nestedLoad参数，如下所示:

You should use -nestedLoad param like this:

a = load 'input' using com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]).

然后使用以下代码:

b = FOREACH a GENERATE (json#'AD') as AD:bag{t:Tuple(m:map[])};

然后，您的json数组将成为bag数据类型.您可以将其展平以获取元组.

Then your json array become a bag datatype. You can flatten it to get tuple.

c = FOREACH b GENERATE FLATTEN(AD);
d = FOREACH c GENERATE AD::m#ID AS ID, AD::m#C1 AS C1, AD::m#C2 AS C2, AD::m#ST AS ST, AD::m#PO AS PO

这时，您将获得模式为(ID:bytearray，C)的元组数据类型

At this time, you will get the tuple data type which the schema is (ID:bytearray, C)

这篇关于PIG中的Elephant-Bird UDF中的JSON数组字段处理的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PIG中的Elephant-Bird UDF中的JSON数组字段处理 [英] JSON Array field handling in Elephant-Bird UDF in PIG

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

PIG中的Elephant-Bird UDF中的JSON数组字段处理 [英] JSON Array field handling in Elephant-Bird UDF in PIG

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭