从猪出口到CSV [英] Export from pig to CSV

查看:205
本文介绍了从猪出口到CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多麻烦,将数据从猪和CSV中获取,我可以在Excel或SQL(或R或SPSS等)中使用,而无需大量操作...

I'm having a lot of trouble getting data out of pig and into a CSV that I can use in Excel or SQL (or R or SPSS etc etc) without a lot of manipulation ...

我尝试使用以下功能:

STORE pig_object INTO '/Users/Name/Folder/pig_object.csv'
    USING CSVExcelStorage(',','NO_MULTILINE','WINDOWS');

它创建具有大量part-m-0000#文件的该名称的文件夹。我可以后来使用cat part *> filename.csv加入它们,但是没有标题意味着我必须手动放置。

It creates the folder with that name with lots of part-m-0000# files. I can later join them all up using cat part* > filename.csv but there's no header which means I have to put it in manually.

我已经看到PigStorageSchema应该使用标题创建另一个位,但它似乎根本不起作用,例如,我得到相同的结果,如果它只是存储,没有头文件:
STORE pig_object INTO'/ Users / Name /文件夹/ p​​ig_object'
使用org.apache.pig.piggybank.storage.PigStorageSchema();

I've read that PigStorageSchema is supposed to create another bit with a header but it doesn't seem to work at all, eg, I get the same result as if it's just stored, no header file: STORE pig_object INTO '/Users/Name/Folder/pig_object' USING org.apache.pig.piggybank.storage.PigStorageSchema();

(我已经尝试过本地和mapreduce模式)

(I've tried this in both local and mapreduce mode).

有没有什么办法可以将数据从Pig中获取到一个简单的CSV文件,没有这些多个步骤?

Is there any way of getting the data out of Pig into a simple CSV file without these multiple steps?

任何帮助将不胜感激!

推荐答案

恐怕没有一个工作,但你可以想出以下(Pig v0.10.0):

I'm afraid there isn't a one-liner which does the job,but you can come up with the followings (Pig v0.10.0):

A = load '/user/hadoop/csvinput/somedata.txt' using PigStorage(',') 
      as (firstname:chararray, lastname:chararray, age:int, location:chararray);
store A into '/user/hadoop/csvoutput' using PigStorage('\t','-schema');

PigStorage 采用' -schema '它将创建一个' .pig_schema '和输出目录中的 .pig_header '。那么你必须将' .pig_header '与' part-x-xxxxx '合并:

When PigStorage takes '-schema' it will create a '.pig_schema' and a '.pig_header' in the output directory. Then you have to merge '.pig_header' with 'part-x-xxxxx' :

1。如果结果需要复制到本地磁盘:

1. If result need to by copied to the local disk:

hadoop fs -rm /user/hadoop/csvoutput/.pig_schema
hadoop fs -getmerge /user/hadoop/csvoutput ./output.csv

(由于 -getmerge 需要一个输入目录才能摆脱 .pig_schema first)

(Since -getmerge takes an input directory you need to get rid of .pig_schema first)

2。将结果存储在HDFS中:

2. Storing the result on HDFS:

hadoop fs -cat /user/hadoop/csvoutput/.pig_header 
  /user/hadoop/csvoutput/part-x-xxxxx | 
    hadoop fs -put - /user/hadoop/csvoutput/result/output.csv

为进一步参考,您可能还会看到这些帖子:

STORE输出到单个CSV?

如何使用Hadoop FS shell将hadoop中的两个文件连接成一个?

For further reference you might also have a look at these posts:
STORE output to a single CSV?
How can I concatenate two files in hadoop into one using Hadoop FS shell?

这篇关于从猪出口到CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆