是否可以直接从文件加载镶木地板表? [英] Is it possible to load parquet table directly from file?

查看:158
本文介绍了是否可以直接从文件加载镶木地板表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有一个二进制数据文件(它可以转换为csv格式),有没有办法直接从它加载镶木地板表?许多教程显示将csv文件加载到文本表格,然后从文本表格加载到镶木地板表格。从效率的角度来看,是否可以直接从我已有的二进制文件中加载镶木地板表?理想情况下使用create external table命令。
或者我需要先将它转换为csv文件?是否有任何文件格式限制?

解决方案

遗憾的是,无法从Impala中的自定义二进制格式读取数据。您应该将文件转换为csv,然后在现有的csv文件上创建一个外部表作为临时表,最后将其插入从temp csv表读取的最终parquet表中。 Impala Parquet文档有很多更多的信息和一些相关的例子。 我不知道如何将文件格式转换为csv,但可以考虑编写一个程序来转换你的二进制格式Parquet。例如,您可以编写一个写入Parquet文件的MapReduce作业。以下是一个读写Parquet的示例:
https: //github.com/cloudera/parquet-examples/blob/master/MapReduce/TestReadWriteParquet.java


If I have a binary data file(it can be converted to csv format), Is there any way to load parquet table directly from it? Many tutorials show loading csv file to text table, and then from text table to parquet table. From efficiency point of view, is it possible to load parquet table directly from either a binary file like what I already have? Ideally using create external table command. Or I need to convert it to csv file first? Is there any file format restriction?

解决方案

Unfortunately it is not possible to read from a custom binary format in Impala. You should convert your files to csv, then create an external table over the existing csv files as a temporary table, and finally insert into a final parquet table reading from the temp csv table. The Impala Parquet documentation has a lot more information and some related examples. See the section about compacting small files, which is similar.

I don't know how you convert your file format to csv, but you might consider writing a program to convert your binary format to Parquet. For example, you can write a MapReduce job that writes Parquet files. Here's an example that reads and writes Parquet: https://github.com/cloudera/parquet-examples/blob/master/MapReduce/TestReadWriteParquet.java

这篇关于是否可以直接从文件加载镶木地板表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆