Pig:使用外部模式文件加载数据文件 [英] Pig: loading a data file using an external schema file

查看:31
本文介绍了Pig:使用外部模式文件加载数据文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据文件和一个相应的架构文件存储在不同的位置.我想使用架构文件中的架构加载数据.我尝试使用

I have a data file and a corresponding schema file stored in separate locations. I would like to load the data using the schema in the schema-file. I tried using

A= LOAD '<file path>' USING PigStorage('\u0001') as '<schema-file path>' 

但出现错误.

正确加载文件的语法是什么?

What is the syntax for correctly loading the file?

架构文件格式类似于:

data1 - complex - - - - format - -
data1 event_type - - - - - long - "ends '\001'"
data1 event_id - - - - - varchar(50) - "ends '\001'"
data1 name_format - - - - - varchar(10) - "ends newline"

推荐答案

AS 子句用于直接指定架构,而不是架构文件的路径.

The AS clause is for specifying the schema directly not the path to the schema file.

 A = LOAD '<file path>' USING PigStorage('\u0001') as 'type: long, id:chararray, nameformat:chararray';

或者,包含架构并位于输入目录中的名为 .pig_schema 的文件也可以工作.从来没有尝试过.它必须是具有以下语法的 JSON 文件:

Alternatively, a file named .pig_schema containing the schema and located in your input directory could work as well. Never tried that though. It must be a JSON file with the following syntax:

{"fields":[
        {"name":"type","type":55,"description":"Fu","schema":null},
        {"name":"id","type":15,"description":"Bar","schema":null},
        {"name":"nameFormat","type":55,"description":"Xu","schema":null},
    ] ,"version":0,"sortKeys":[],"sortKeyOrders":[]}

如果您在使用 PigStorage 存储时指定 -schema 选项,也会生成此文件.

This file is also generated if you specify the -schema option when storing with PigStorage.

这篇关于Pig:使用外部模式文件加载数据文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆