Pig:使用外部模式文件加载数据文件 [英] Pig: loading a data file using an external schema file
问题描述
我有一个数据文件和一个相应的模式文件存储在不同的位置。
我想使用模式文件中的模式加载数据。我尝试使用
A = LOAD'< file path>'USING PigStorage('\\\')as'<模式文件路径>'
但会出错。
正确加载文件的语法是什么?
模式文件格式如下所示:
data1 - complex - - - - format - -
pre
data1 event_type - - - - long - ends\001'
data1 event_id - - - - - varchar(50) - ends'\001'
data1 name_format - - - - - varchar(10) - ends newline
解决方案AS子句用于直接指定模式而不是模式文件的路径。 b
A = LOAD'< file path>'使用PigStorage('\\\')作为'type:long,id:chararray,nameformat:chararray' ;
或者,一个名为
.pig_schema
的文件包含模式并位于您的输入目录中也可以工作。但从来没有尝试过。它必须是具有以下语法的JSON文件:
{fields:[
{name: type,type:55,description:Fu,schema:null},
{name:id,type:15,description:Bar ,schema:null},
{name:nameFormat,type:55,description:Xu,schema:null},
]版本:0,sortKeys:[],sortKeyOrders:[]}
如果您在使用PigStorage进行存储时指定了-schema选项,则也会生成该文件。
I have a data file and a corresponding schema file stored in separate locations. I would like to load the data using the schema in the schema-file. I tried using
A= LOAD '<file path>' USING PigStorage('\u0001') as '<schema-file path>'
but get an error.
What is the syntax for correctly loading the file?
The schema file format is something like:
data1 - complex - - - - format - - data1 event_type - - - - - long - "ends '\001'" data1 event_id - - - - - varchar(50) - "ends '\001'" data1 name_format - - - - - varchar(10) - "ends newline"
解决方案The AS clause is for specifying the schema directly not the path to the schema file.
A = LOAD '<file path>' USING PigStorage('\u0001') as 'type: long, id:chararray, nameformat:chararray';
Alternatively, a file named
.pig_schema
containing the schema and located in your input directory could work as well. Never tried that though. It must be a JSON file with the following syntax:{"fields":[ {"name":"type","type":55,"description":"Fu","schema":null}, {"name":"id","type":15,"description":"Bar","schema":null}, {"name":"nameFormat","type":55,"description":"Xu","schema":null}, ] ,"version":0,"sortKeys":[],"sortKeyOrders":[]}
This file is also generated if you specify the -schema option when storing with PigStorage.
这篇关于Pig:使用外部模式文件加载数据文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!