将文本加载到Orc文件 [英] load text to Orc file

查看:372
本文介绍了将文本加载到Orc文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将文本文件加载到Hive orc外部表中?

How to load text file into Hive orc external table?

create table MyDB.TEST (
 Col1 String,
 Col2 String,
 Col3 String,
 Col4 String)
 STORED AS INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
 OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';

我已经在上表中创建了Orc.但是在从表中获取数据时,它显示以下错误 失败

I have already created above table as Orc. but while fetching data from table it show below error Failed with exception

java.io.IOException:org.apache.orc.FileFormatException:格式错误的ORC文件hdfs://localhost:9000/Ext/sqooporc/part-m-00000.无效的 后记.

java.io.IOException:org.apache.orc.FileFormatException: Malformed ORC file hdfs://localhost:9000/Ext/sqooporc/part-m-00000. Invalid postscript.

推荐答案

有多个步骤.详细说明.

There are multiple steps to that. Follows the details.

  1. 创建一个能够从纯文本文件读取的配置单元表.假设您的文件是逗号分隔文件,并且文件位于HDFS上的/user/data/file1.txt位置,则语法如下.

  1. Create a hive table which is able to read from the plain text file. Assuming that your file is a comma delimited file and your file is on HDFS on a location called /user/data/file1.txt, follows will be the syntax.

create table MyDB.TEST (
  Col1 String,
  Col2 String,
  Col3 String,
  Col4 String
)
row format delimited
fields terminated by ','
location '/user/data/file1.txt';

现在您有了一个与您所拥有的数据格式同步的架构.

Now you have a schema which is in sync with the format of the data you posses.

  1. 使用ORC模式创建另一个表

现在,您需要像之前创建的那样创建ORC表.这是用于创建该表的简单语法.

Now you need to create the ORC table as you were creating earlier. Here is a simpler syntax for creating that table.

create table MyDB.TEST_ORC (
  Col1 String,
  Col2 String,
  Col3 String,
  Col4 String)
STORED AS ORC;

  1. 您的TEST_ORC表现在是空表.您可以使用以下命令,使用TEST表中的数据填充该表.

  1. Your TEST_ORC table is an empty table now. You can populate this table using the data from TEST table using the following command.

INSERT OVERWRITE TABLE TEST_ORC SELECT * FROM TEST;

上述语句将从TEST表中选择所有记录,并将尝试将这些记录写入TEST_ORC表.由于TEST_ORC是ORC表,因此将数据写入表后会立即将其转换为ORC格式.

The aforementioned statement will select all the records from TEST table and will try to write those records to TEST_ORC table. Since TEST_ORC is an ORC table, the data will be converted to ORC format on the fly when written into the table.

您甚至可以检查TEST_ORC表中ORC文件的存储位置.

You can even check the storage location of TEST_ORC table for ORC files.

现在您的数据为ORC格式,并且表TEST_ORC具有解析所需的架构.如果不需要,您可以立即删除TEST表.

Now your data is in ORC format and your table TEST_ORC has the required schema to parse it. You may drop your TEST table now, if not needed.

希望有帮助!

这篇关于将文本加载到Orc文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆