在Pig中提取CSV文件的第一行 [英] Extract first line of CSV file in Pig
问题描述
我有几个CSV文件,标题总是文件中的第一行。在Pig中将该行作为字符串从CSV文件中获取的最好方法是什么?预处理与sed,awk等不是一个选项。
I have several CSV files and the header is always the first line in the file. What's the best way to get that line out of the CSV file as a string in Pig? Preprocessing with sed, awk etc is not an option.
我试图加载文件与普通PigStorage和Piggy bank CsvLoader,但它不清楚我怎么I可以得到第一行,如果有的话。
I've tried loading the file with regular PigStorage and the Piggy bank CsvLoader, but its not clear to me how I can get that first line, if at all.
我可以写一个UDF,如果这是需要的。
I'm open to writing an UDF, if that's what it takes.
推荐答案
如果你的CSV符合Excel 2007的CSV约定,你可以使用已经可用的加载器从Piggybank http://svn.apache.org/ viewvc / pig / trunk / contrib / piggybank / java / src / main / java / org / apache / pig / piggybank / storage / CSVExcelStorage.java?view = markup
If your CSV comply with CSV conventions of Excel 2007 you can use already available loader from Piggybank http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java?view=markup
它可以跳过CSV标题 SKIP_INPUT_HEADER
It has an option to skip the CSV header SKIP_INPUT_HEADER
这篇关于在Pig中提取CSV文件的第一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!