Hive INSERT OVERWRITE DIRECTORY命令输出不由分隔符分隔。为什么? [英] Hive INSERT OVERWRITE DIRECTORY command output is not separated by a delimiter. Why?

查看:2595
本文介绍了Hive INSERT OVERWRITE DIRECTORY命令输出不由分隔符分隔。为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我加载的文件用''(空格)分隔。以下是该文件。该文件驻留在HDFS中: -

  001 000 
001 000
002 001
003 002
004 003
005 004
006 005
007 006
008 007
099 007

1>我正在创建一个外部表并通过发出以下命令来加载文件: - $ / $>

  CREATE EXTERNAL TABLE IF NOT EXISTS graph_edges(src_node_id STRING COMMENT'源节点的节点ID',dest_node_id STRING COMMENT'目标节点的节点ID')行格式DELIMITED FIELDS TERMINATED BY''存储为TEXTFILE LOCATION'/ user / hadoop / input'; 

2>在此之后,我只需发出以下命令将表插入另一个文件中: -

  INSERT OVERWRITE DIRECTORY'/ user / hadoop / output'SELECT * FROM graph_edges; 



<3>现在,当我捕获文件时,字段不会被任何分隔符分隔: -

  hadoop dfs -cat / user / hadoop / output / 000000_0 

  001000 
001000


$ b

b 002001
003002
004003
005004
006005
007006
008007
099007

有人可以帮我吗?为什么分隔符被删除以及如何分隔输出文件?



在CREATE TABLE命令中,我尝试了 DELIMITED BY'\ t'但后来我得到了不必要的NULL列。



任何指针都非常值得赞赏。我使用Hive 0.9.0版本。

解决方案

问题是,HIVE不允许您指定输出分隔符 - a href =https://issues.apache.org/jira/browse/HIVE-634> https://issues.apache.org/jira/browse/HIVE-634



解决的办法是为输出创建外部表格(使用分隔符指定)并插入覆盖表格而不是目录。

- p>

假设您在HDFS中拥有/user/hadoop/input/graph_edges.csv,

 蜂房>创建外部表graph_edges(src字符串,dest字符串)
>行格式分隔
>由''
>终止的字段以'\\\
'结尾的行
>存储为文本文件位置'/ user / hadoop / input';

hive>从graph_edges中选择*;
OK
001 000
001 000
002 001
003 002
004 003
005 004
006 005
007 006
008 007
099 007

hive>创建外部表graph_out(src字符串,dest字符串)
>行格式分隔
>由''
>终止的字段以'\\\
'结尾的行
>存储为文本文件位置'/ user / hadoop / output';

hive>插入表graph_out select * from graph_edges;
hive>从graph_out中选择*;
OK
001 000
001 000
002 001
003 002
004 003
005 004
006 005
007 006
008 007
099 007

[user @ box] hadoop fs -get / user / hadoop / output / 000000_0。

返回上面的空格。


The file that I am loading is separated by ' ' (white space). Below is the file. The file resides in HDFS:-

001 000
001 000
002 001
003 002
004 003
005 004
006 005
007 006
008 007
099 007

1> I am creating an external table and loading the file by issuing the below command:-

CREATE EXTERNAL TABLE IF NOT EXISTS graph_edges (src_node_id STRING COMMENT 'Node ID of Source node', dest_node_id STRING COMMENT 'Node ID of Destination node') ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION '/user/hadoop/input';

2> After this, I am simply inserting the table in another file by issuing the below command:-

INSERT OVERWRITE DIRECTORY '/user/hadoop/output' SELECT * FROM graph_edges;

3> Now, when I cat the file, the fields are not separated by any delimiter:-

hadoop dfs -cat /user/hadoop/output/000000_0

Output:-

001000
001000
002001
003002
004003
005004
006005
007006
008007
099007

Can somebody please help me out? Why is the delimiter being removed and how to delimit the output file?

In the CREATE TABLE command I tried DELIMITED BY '\t' but then I am getting unnecessary NULL column.

Any pointers help much appreciated. I am using Hive 0.9.0 version.

解决方案

The problem is that HIVE does not allow you to specify the output delimiter - https://issues.apache.org/jira/browse/HIVE-634

The solution is to create external table for output (with delimiter specification) and insert overwrite table instead of directory.

--

Assuming that you have /user/hadoop/input/graph_edges.csv in HDFS,

hive> create external table graph_edges (src string, dest string) 
    > row format delimited 
    > fields terminated by ' ' 
    > lines terminated by '\n' 
    > stored as textfile location '/user/hadoop/input';

hive> select * from graph_edges;
OK
001 000
001 000
002 001
003 002
004 003
005 004
006 005
007 006
008 007
099 007

hive> create external table graph_out (src string, dest string) 
    > row format delimited 
    > fields terminated by ' ' 
    > lines terminated by '\n' 
    > stored as textfile location '/user/hadoop/output';

hive> insert into table graph_out select * from graph_edges;
hive> select * from graph_out;
OK
001 000
001 000
002 001
003 002
004 003
005 004
006 005
007 006
008 007
099 007

[user@box] hadoop fs -get /user/hadoop/output/000000_0 .

Comes back as above, with spaces.

这篇关于Hive INSERT OVERWRITE DIRECTORY命令输出不由分隔符分隔。为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆