Pig:使用外部模式文件加载数据文件 [英] Pig: loading a data file using an external schema file
问题描述
我有一个数据文件和一个相应的架构文件存储在不同的位置.我想使用架构文件中的架构加载数据.我尝试使用
I have a data file and a corresponding schema file stored in separate locations. I would like to load the data using the schema in the schema-file. I tried using
A= LOAD '<file path>' USING PigStorage('\u0001') as '<schema-file path>'
但出现错误.
正确加载文件的语法是什么?
What is the syntax for correctly loading the file?
架构文件格式类似于:
data1 - complex - - - - format - -
data1 event_type - - - - - long - "ends '\001'"
data1 event_id - - - - - varchar(50) - "ends '\001'"
data1 name_format - - - - - varchar(10) - "ends newline"
推荐答案
AS 子句用于直接指定架构,而不是架构文件的路径.
The AS clause is for specifying the schema directly not the path to the schema file.
A = LOAD '<file path>' USING PigStorage('\u0001') as 'type: long, id:chararray, nameformat:chararray';
或者,包含架构并位于输入目录中的名为 .pig_schema
的文件也可以工作.从来没有尝试过.它必须是具有以下语法的 JSON 文件:
Alternatively, a file named .pig_schema
containing the schema and located in your input directory could work as well. Never tried that though. It must be a JSON file with the following syntax:
{"fields":[
{"name":"type","type":55,"description":"Fu","schema":null},
{"name":"id","type":15,"description":"Bar","schema":null},
{"name":"nameFormat","type":55,"description":"Xu","schema":null},
] ,"version":0,"sortKeys":[],"sortKeyOrders":[]}
如果您在使用 PigStorage 存储时指定 -schema 选项,也会生成此文件.
This file is also generated if you specify the -schema option when storing with PigStorage.
这篇关于Pig:使用外部模式文件加载数据文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!