使用字符串数组在Hive表上加载CSV文件 [英] Loading CSV file on Hive Table with String Array

查看:238
本文介绍了使用字符串数组在Hive表上加载CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将CS​​V文件插入到Hive中,其中一个字段是字符串数组.

I am trying to insert a CSV File into Hive with one field being array of string .

这是CSV文件:

48,Snacks that Power Up Weight Loss,Aidan B. Prince,[Health&Fitness,Travel]
99,Snacks that Power Up Weight Loss,Aidan B. Prince,[Photo,Travel]

我尝试创建类似这样的表:

I tried creating table something like this :

CREATE TABLE IF NOT EXISTS Article
(
ARTICLE_ID INT,
ARTICLE_NSAME STRING,
ARTICLE_AUTHOR STRING,
ARTICLE_GENRE ARRAY<STRING>
);
LOAD DATA INPATH '/tmp/pinterest/article.csv' OVERWRITE INTO TABLE Article;
select * from Article;  

这是我得到的输出:

article.article_id  article.article_name    article.article_author  article.article_genre
48  Snacks that Power Up Weight Loss    Aidan B. Prince ["[Health&Fitness"]
99  Snacks that Power Up Weight Loss    Aidan B. Prince ["[Photo"]

在最后一个字段article_genre中仅取一个值.

Its taking only one value in last field article_genre .

有人可以指出这里有什么问题吗?

Can someone point out what wrong here ?

推荐答案

事物对:
您缺少集合项定界符的定义.
另外,我假设您希望you select * from article语句返回如下:

Couple of stuff :
You are missing definition for delimiter for collection items.
Also , I assume you expect you select * from article statement to return like below :

48  Snacks that Power Up Weight Loss    Aidan B. Prince ["Health&Fitness","Travel"]
99  Snacks that Power Up Weight Loss    Aidan B. Prince ["Photo","Travel"]

我可以举一个例子,休息一下吧. 这是我的表定义:

I can give you an example and rest you can fiddle with it . Here is my table definition :

create table article (
  id int,
  name string,
  author string,
  genre array<string>
)
row format delimited
fields terminated by ','
collection items terminated by '|';

这是数据:

48,Snacks that Power Up Weight Loss,Aidan B. Prince,Health&Fitness|Travel
99,Snacks that Power Up Weight Loss,Aidan B. Prince,Photo|Travel

现在像这样进行负荷:
LOAD DATA local INPATH '/path' OVERWRITE INTO TABLE article; 并执行select语句以检查结果.

Now do a load like :
LOAD DATA local INPATH '/path' OVERWRITE INTO TABLE article; and do select statement to check the result.

最重要的一点:
为收集项定义定界符,不要强加在常规编程中执行的数组结构.
另外,请尝试使字段定界符与收集项定界符不同,以免造成混淆和意外结果.

Most important point :
define delimiter for collection items and don't impose the array structure you do in normal programming.
Also, try to make the field delimiters different from collection items delimiters to avoid confusion and unexpected results.

这篇关于使用字符串数组在Hive表上加载CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆