在pig中存储解压后的数据 [英] Store data after decompression in pig

查看:20
本文介绍了在pig中存储解压后的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的文件格式是 -

 ({"food":"Tacos", "person":"Alice", "amount":3})
    ({"food":"Tomato Soup", "person":"Sarah", "amount":2})
    ({"food":"Grilled Cheese", "person":"Alex", "amount":5})

我尝试使用以下代码存储它

I tried to store this using the following code

STORE STOCK_A 
    INTO 'default.ash_json_pigtest' 
    USING HCatStorer();

存储的数据如下所示.

 {"food":"Tacos", "person":"Alice", "amount":3}             None    None
    {"food":"Tomato Soup", "person":"Sarah", "amount":2}    None    None
    {"food":"Grilled Cheese", "person":"Alex", "amount":5}  None    None

预期产出

    Tacos           Alice   3
    Tomato Soup     Sarah   2
    Grilled Cheese  Alex    5

我怎样才能做到这一点?提前致谢.

How can I achieve this? Thanks in advance.

推荐答案

您的问题不在于如何存储数据,而在于如何加载数据.您有一个 JSON 文件,但您将整个 JSON 读入一个字段,因此每行只有一个字段.当您将其保存到您的 HCatalog 表中时,您会在一个字段和两个空字段中获得 1 行 JSON.

Your problem is not how you store the data, but how you are loading it. You have a JSON file but you are reading the whole JSON into one field, so you get only one field per row. When you save it into your HCatalog table, you get 1 row with the JSON in one field and two null fields.

不要使用 PigStorage 或您正在使用的任何方法加载数据,而是使用 JsonLoader 加载数据:

Instead of loading the data with PigStorage or whatever you are using, load it with JsonLoader:

STOCK_TABLE = LOAD 'your.data' USING JsonLoader('food:chararray, person:chararray, amount:int');

您可以DUMP检查数据是否正确:

You can DUMP the data to check that now it's correct:

DUMP STOCK_A;

(Tacos,Alice,3)
(Tomato Soup,Sarah,2)
(Grilled Cheese,Alex,5)

代替:

DUMP STOCK_A;

({"food":"Tacos", "person":"Alice", "amount":3})
({"food":"Tomato Soup", "person":"Sarah", "amount":2})
({"food":"Grilled Cheese", "person":"Alex", "amount":5})

这篇关于在pig中存储解压后的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆