从猪出口到CSV [英] Export from pig to CSV
问题描述
我有很多麻烦把数据从猪和CSV中,我可以使用在Excel或SQL(或R或SPSS等),而没有很多操作...
I'm having a lot of trouble getting data out of pig and into a CSV that I can use in Excel or SQL (or R or SPSS etc etc) without a lot of manipulation ...
我试过使用以下函数:
STORE pig_object INTO '/Users/Name/Folder/pig_object.csv'
USING CSVExcelStorage(',','NO_MULTILINE','WINDOWS');
它创建了一个带有很多part-m-0000#文件的文件夹。我可以以后加入他们所有使用cat part *> filename.csv,但没有标题,这意味着我必须手动。
It creates the folder with that name with lots of part-m-0000# files. I can later join them all up using cat part* > filename.csv but there's no header which means I have to put it in manually.
我读过PigStorageSchema应该创建另一个带头的位,但它似乎并不工作,例如,我得到相同的结果,就像只是存储,没有头文件:
STORE pig_object INTO'/ Users / Name /文件夹/ pig_object'
使用org.apache.pig.piggybank.storage.PigStorageSchema();
I've read that PigStorageSchema is supposed to create another bit with a header but it doesn't seem to work at all, eg, I get the same result as if it's just stored, no header file: STORE pig_object INTO '/Users/Name/Folder/pig_object' USING org.apache.pig.piggybank.storage.PigStorageSchema();
(我在本地和mapreduce模式)。
(I've tried this in both local and mapreduce mode).
有没有办法将数据从Pig中导出到一个简单的CSV文件,而没有这些多个步骤?
Is there any way of getting the data out of Pig into a simple CSV file without these multiple steps?
任何帮助将非常感谢!
推荐答案
恐怕没有一个作业,但你可以得出以下(Pig v0.10.0):
I'm afraid there isn't a one-liner which does the job,but you can come up with the followings (Pig v0.10.0):
A = load '/user/hadoop/csvinput/somedata.txt' using PigStorage(',')
as (firstname:chararray, lastname:chararray, age:int, location:chararray);
store A into '/user/hadoop/csvoutput' using PigStorage('\t','-schema');
当 PigStorage 使用 -schema
,它将创建一个 .pig_schema
'和' .pig_header
'。然后,您必须将 .pig_header
与 part-x-xxxxx
合并:
When PigStorage takes '-schema
' it will create a '.pig_schema
' and a '.pig_header
' in the output directory. Then you have to merge '.pig_header
' with 'part-x-xxxxx
' :
1。如果结果需要复制到本地磁盘:
1. If result need to by copied to the local disk:
hadoop fs -rm /user/hadoop/csvoutput/.pig_schema
hadoop fs -getmerge /user/hadoop/csvoutput ./output.csv
(因为 -getmerge
需要一个输入目录,你需要摆脱 .pig_schema
first)
(Since -getmerge
takes an input directory you need to get rid of .pig_schema
first)
2。将结果存储在HDFS上:
2. Storing the result on HDFS:
hadoop fs -cat /user/hadoop/csvoutput/.pig_header
/user/hadoop/csvoutput/part-x-xxxxx |
hadoop fs -put - /user/hadoop/csvoutput/result/output.csv
要进一步参考,您还可以查看这些帖子:
STORE输出到单个CSV?
如何使用Hadoop FS shell将hadoop中的两个文件连接成一个?
For further reference you might also have a look at these posts:
STORE output to a single CSV?
How can I concatenate two files in hadoop into one using Hadoop FS shell?
这篇关于从猪出口到CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!